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INTRODUCTION 


The  alphaviruses  are  a  widespread  group  of  human  pathogens  that  are 
present  virtually  everywhere  in  the  world  (Griffin,  1986;  Monath,  1988;  Peters  and 
Dalrymple,  1990)  They  are  mosquito-bome  viruses  and  thus  are  particularly 
prevalent  in  tropical  areas  where  mosquitoes  abound  and  problems  of 
overwintering  by  the  virus  do  not  arise,  but  are  also  present  in  temperate  areas  of 
the  world  including  the  United  States.  They  have  the  capacity  to  replicate  in  the 
mosquito  vector  as  well  as  in  human  hosts  or  in  various  species  of  birds  and 
mammals.  Old  World  alphaviruses  are,  in  general,  capable  of  causing  a  painful 
and  disabling  disease  in  man  characterized  by  fever,  rash  and  arthralgia.  In  the 
cases  of  the  Ockelbo  strain  of  Sindbis  virus  and  of  Ross  River  virus,  this  arthralgia 
manifests  as  a  polyarthritis  that  may  in  some  cases  last  for  months  or  years. 
xVlany  of  the  New  World  alphaviruses  can  cause  fatal  encephalitis  in  man.  Our 
program  attempts  to  understand  the  molecular  basis  of  alphavirus 
immimogenicity  and  to  determine  the  relationships  of  alphaviruses  to  one 
another,  and  has  developed  in  collaboration  with  Drs.  Alan  Schmaljohn  and  Joel 
Dalrymple  of  USAMRIID. 


METHODS  USED 

Virus  Strains.  Viruses  used  in  this  study  were  from  the  collection  of  Dr.  J. 
M.  Dalrymple  of  USAMRIID.  Viruses  were  grown  in  BHK  cells,  in  secondary 
chicken  embryo  fibroblast  cells,  or  in  mosquito  cells,  purified,  and  RNA  prepared 
as  described  (Ou  et  al.,  1981;  Shirako  et  al.,  1991). 
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cDNA  Clones.  cDNA  clones  were  made  in  one  of  two  ways.  The  first 
method  used  standard  procedures  in  which  first  strand  cDNA  was  made  using 
oligo(dT)  as  primer  and  second  strand  synthesis  was  by  the  method  of  Gubler  and 
Hoffinan  (Gubler  and  Hoffman,  1983;  Sambrook  et  al.,  1989).  These  cloning 
methods,  as  well  as  the  methods  of  DNA  sequencing  and  RNA  sequencing,  have 
been  described  in  numerous  publications  from  our  laboratory  over  the  years 
(Hahn  et  al.,  1985;  Rice  et  al.,  1985;  Rice  and  Strauss,  1981;  Shirako  et  al.,  1991; 
Strauss  et  al.,  1984). 

In  a  second  approach,  we  developed  methods  suitable  for  high  throughput 
automated  DNA  sequencing,  in  order  to  speed  up  the  acquisition  of  sequence  data. 
Whataroa  virus  was  chosen  as  a  test  virus.  The  methods  were  described  in  detail 
in  our  annual  report  of  4/23/92.  Briefly,  first  strand  cDNA  synthesis  used  random 
priming  and  second  strand  cDNA  was  synthesized  by  the  method  of  Gubler  and 
Hoffman  (Gubler  and  Hoffman,  1983).  After  blunt-ending  the  double-stranded 
cDNA,  the  internal  EcoRl  restriction  sites  were  methylated  and  the  DNA  was 
electrophoresed  in  an  agarose  gel.  jBcoRI  linkers  were  attached  to  the  2-4  kb 
fraction  and  the  DNA  cloned  in  the  £coRI  site  of  a  suitable  vector.  One  hundred 
clones  that  resulted  from  this  cloning  were  characterized  by  restriction  analysis 
and  many  of  them  were  sequenced  using  an  Applied  Biosystems  automated  DNA 
sequencer. 

C!onstruction  and  Screening  of  the  Bacteriophage  Lambda  Library.  Sindbis 
virus  strain  AR339  from  A.  Schmaljohn  at  USAMRIID  was  grown  in  monolayers 
of  primary  chicken  cells  (Pierce  et  al.,  1974).  Virus  was  purified  as  described  (Bell 
et  al.,  1978),  disrupted  with  0.5%  SDS,  and  49S  genomic  RNA  extracted  with 
phenol/chloroform  (Hsu  et  al.,  1973).  After  two  ethanol  precipitations,  RNA  was 
suspended  in  distilled  water  and  stored  at  -70°C  until  used  as  a  template  for  cDNA 
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synthesis.  A  Xgtll  expression  library  containing  short  inserts  of  Sindbis  cDNA 
was  constructed  by  a  modification  of  the  procedure  of  Young  and  Davis  (Young 
and  Davis,  1983).  cDNA  synthesis  was  randomly  primed  with  sonicated  salmon 
testis  DNA.  After  flush-ending  the  product  with  the  Klenow  fragement  of  DNA 
polymerase  I,  methylation  with  EcoBl  methytransferase,  and  addition  of  EcoRl 
linkers,  the  modified  cDNA  was  digested  with  an  excess  of  EcoRl  restriction 
enzyme.  The  digested  DNA  was  fractionated  on  a  Sephadex  CL-6B  column,  and 
cDNA  fragments  100-300  base  pairs  in  size  were  pooled  and  ligated  to 
dephosphorylated  Xgtll  arms  (Promega).  After  in  vitro  packaging  into  phage 
heads  (Stratagene),  phage  plaques  were  grown  for  6  h  at  42°C.  Nitrocellulose 
disks  soaked  in  10  mM  isopropyl  thio-P-D-galactopyranoside  were  then  placed  on 
top  of  the  agar  layer,  and  the  plates  were  transferred  to  37°C  for  15  h.  The  filters 
were  lifted  and  washed  successively  in  10  mM  Tris-Cl  pH  7.5  and  150  mM  NaCl 
containing  5%  nonfat  milk.  The  filters  were  incubated  overnight  at  4®C  with  a 
monoclonal  antibody  in  PBS  solution  containing  5%  nonfat  milk,  washed,  and  the 
filters  were  then  incubated  at  least  two  hours  at  room  temperature  in  the 
presence  of  125i.conjugated  protein  G  (0.5  pCi/ml  in  5%  nonfat  milk).  After 
washing  and  drying,  the  filters  were  exposed  overnight  at  -80°C  to  Kodak-X-Omat 
film.  Immunoreactive  phage  were  picked  and  rescreened  until  a  uniformly 
reactive  population  was  obtained. 

MAPPING  OF  NEUTRALIZING  ANTIGENIC  EPITOPES  OF  ALPHAVIRUSES 

We  have  localized  a  site  in  alphavirus  glycoprccein  E2  that  binds 
neutralizing  antibodies.  Characterization  of  such  immunogenic  domains  is 
important  in  developing  vaccines,  because  neutralizing  antibodies  are  thought  to 
be  particularly  important  in  protecting  a  vaccinee  from  viral  infection.  We 
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developed  a  novel  approach  in  which  Xgtll  expression  libraries  were  constructed 
that  expressed  parts  of  the  Sindbis  genome,  and  these  were  screened  with 
neutralizing  monoclonal  antibodies  (MAbs).  Many  neutralizing  antibodies  react 
with  discontinuous  epitopes  and  thus  will  not  react  with  a  chimeric  protein 
expressed  in  a  Xgtll  library.  However,  we  succeeded  in  identifying  one  MAb 
which  bound  to  specific  clones  within  the  Xgtll  library  (Wang  and  Strauss,  1991). 
Four  Xgtll  clones  were  found  that  reacted  with  MAb23,  and  a  schematic  of  these 
four  clones  in  relation  to  the  Sindbis  virus  genome  is  shown  in  Fig,  1.  The  four 
clones  all  contain  overlapping  inserts  from  the  E2  region  of  the  genome,  and  the 
sequence  of  E2  from  residues  173  to  220  is  present  in  all.  This  demonstrates 
directly  that  this  neutralizing  MAb  binds  to  glycoprotein  E2  of  Sindbis  virus 
between  residues  173  and  220, 

The  result  with  MAb23  confirmed  and  extended  results  in  which  variants 
of  the  virus  selected  to  be  resistant  to  neutralizing  MAbs  were  sequenced  in  order 
to  identify  the  regions  within  the  glycoproteins  of  the  virus  with  which  the 
antibodies  react  (Strauss  et  al.,  1991).  This  is  illustrated  in  Fig.  1  in  which  the 
sequence  of  E2  between  residues  173  and  220  is  shown,  and  the  location  of  many 
variants  that  render  the  virus  resistant  to  neutralization  by  several  MAbs  is 
indicated.  It  is  clear  that  the  domain  between  residues  170  and  220  of  glycoprotein 
E2  of  alphaviruses  is  particularly  important  for  the  antibody  response  of  a  host. 
We  have  estimated  that  90%  of  the  neutralizing  antibodies  produced  by  an  infected 
mouse  are  directed  against  this  E2  domain  (Strauss  et  al.,  1991) 

This  domain  of  E2  identified  as  being  important  for  reactivity  with 
neutralizing  antibodies  also  appears  to  be  important  for  virus  attachment  to  host 
cell  receptors.  First,  many  neutralizing  antibodies  are  thought  to  inactivate  the 
virus  by  binding  to  the  domain  that  interacts  with  the  cell  receptor,  thus  blocking 
virus  binding  to  the  cell.  Second,  antiidiotypic  antibodies  made  to  MAbs  that  bind 
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Figure  1.  Schematic  representation  of  an  antigenicaily  important  domain  of  Sindbis  virus  glycoprotein  E2.  The  relative 
locations  of  the  inserts  in  four  Xgtll  clones  reactive  with  MAb  23  are  mapped.  The  overlap  region  in  these  four  clones 
between  residues  173  and  220  of  E2  is  expanded  below,  with  a  number  of  key  features  indicated.  Residues  altered  in 
variants  resistant  to  MAbs  are  boxed  and  a  carbohydrate  attachment  site  is  in^cated  with  a  stalked  symbol  (CHO). 


to  this  domain  of  Sindbis  E2  function  as  antireceptor  antibodies  (Wang  et  al., 
1991).  Third,  changes  in  this  region  of  E2  alter  the  ability  of  the  virus  to  bind  to 
neuronal  cells  (Ubol  and  Griffin,  1991).  The  simplest  interpretation  of  these 
restilts  is  that  the  E2  domain  between  170  and  220  binds  to  a  cell  receptor  to  initiate 
infection. 

ALPHAVIRUSES  EXAMINED  FOR  SEQUENCE  RELATIONSHIPS 

We  have  examined  12  strains  of  alphaviruses  for  their  relationships  to  one 
another.  These  12  viruses  are  shown  in  Fig.  2  together  with  the  source  from 
which  they  were  isolated,  their  year  of  .solation,  and  the  location  in  which  they 
were  isolated.  Strains  to  be  examined  were  chosen  in  consultation  with  Dr.  Joel 
Dalrymple  of  US.^^RIID,  and  were  chosen  on  the  basis  of  geography,  year  of 
isolation,  potential  for  human  disease,  and,  in  the  case  of  Aura  virus,  as  a 
possible  parent  for  the  emergent  virus  Western  equine  encephalitis  virus. 

SEQUENCE  ANALYSIS  OF  OCKELBO  VIRUS 

We  have  determined  the  complete  nucleotide  sequence  of  the  genome  of 
Ockelbo  virus.  This  virus  was  chosen  for  analysis  because  it  causes  epidemics  of 
polyarthritis  in  humans,  a  disabling  disease  that  can  last  for  months.  The 
sequence  of  the  virus  isolated  in  1982  in  Edsbyn,  Sweden,  is  shown  in  Fig.  3.  The 
viral  genome  is  11,708  nucleotides  in  length  excluding  the  poly(A)  tail.  The 
genome  is  identical  in  organization  to  that  of  the  Sindbis  virus  AR339  strain 
(Strauss  et  al.,  1984)  isolated  in  Sindbis,  Egypt  in  1952  (Taylor  et  al.,  1955).  There 
are  only  672  nucleotide  differences  between  the  two  viruses  (5.7%  divergence)  that 
result  in  97  amino  acid  changes  (2.6)  divergence.  Thus  more  that  85%  of  all 
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Name 

Strain 

Source 

Year 

Location 

Reference 

Subgroup  I 

Sindbis 

AR339 

Mosquito 
(Culex  univittatus) 

1952 

Egypt 

Taylor  et  al., 
(1955) 

Sindbis 

MP684 

Mosquito 

(Mansonia  fuscopennata) 

1958 

Uganda 

Sindbis 

Girdwood 

Human 

1963 

South  Africa 

Malherbe  et  j 
(1963) 

Sindbis 

R33 

Reed  Warbler 
(Acrocephalus  scirpaceous) 

1971 

Cs^hoslovakia 

Sindbis 

1038 

Txurtle  Dove 
(Streptopelia  turtur) 

1964 

Israel 

Ockelbo 

Edsbyn  82-5 

Mosquito  pool 
iCuliseta  spp.) 

1982 

Edsbyn  village, 
Sweden 

Niklassonet 

(1984) 

Ockelbo 

Edsbyn  83M107 

Mosquito 

(Culiseta  morsUans) 

1983 

Edsbyn  village, 
Sweden 

Karelian  Fever 

LEIV  9298 

Mosquito 
(Aedes  communis) 

1983 

Central  Karelia, 
USSR 

Lvov  etal., 
(1984, 1988) 

Subgroup  11 

Sindbis 

MM2215 

Mosquito 

(Culex  tritaeniorhynchus) 

1955 

Indonesia 

Sindbis 

A-1036 

Mite 

(Bdellonyssus  bursa) 

1953 

India 

Shah  et  al., 
(1960) 

Sindbis 

MRM18520 

Mosquito 

(unidentified) 

1975 

Queensland, 

Australia 

Subgroup  in 

Whataroa 

M78 

Mosquito  pool 

1962 

New  Zealand 

Subgroup  IV 

Aura 

AR10315 

Mosquito 
(Culex  spp.) 

1959 

Brazil 

Causey  et  al. 
1963 

Figure  2  .  Strains  of  Sindbis  virus  end  related  viruses  used  in  this  study. 
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000  ACCACCCABUUCAU0UU^86CUAU8|CA88UUC8UACCC£8C6UACAACACyAAC<^888CC8AC8A8AAA9UCCUU8AA8C0C8UAACAUC86ACU2U6CA&CACAAA8Cw&ACu8AA 

2218RTeKLSXNRKKetKP6SllvyP&y8STLYPEHRASL0$«H 
720  O6£A88ACAe8AAA£UU6UC0AUAAU8A66AA6AA68A6UU8AA6CCC666UC£C666UUUAUUUCUCC8U^6AUC8ACACUVgAS;C£A8AACACA8AGCCA8CUU6CAOA6CU86CAU 

201  tPSyFHL[g)0K0SytCRC07VVSCEByyVKKXTZSPC2T8£ 
040  CUUCCAUC00U0UUCCAyUUm£G6AAAQCA8UC6UACACUU6CC6CUeU8AUACA|UG8U8AB^U8C6AA«8CUAC6UA8UeAA8AAAAUCACyAUCASUCCC68eAUCAC8G6AGAA 

t01TV6yAVT®«SIGFtLCKVT0Tyi(6tRVSrPVCTYIPAT2C 
too  AeC9U688AUACSC86UUACA7ACAAUA6C0A068CUUCUU8CUAU6CAAA8UUAC£8ACACA8UAAA*C6ABAj[Ca88UAUC9UUCCCU6U8U&CAC8UAyAUCCC88CCACCAUAUCC 

t4JOOM78I>lAT01$POOAOKtLV8l.>»3AiyiWtBTWOMTWTN 
1000  •AUCA8AU0AC£e8£AUAAU8GCCAC86AUAU£jCACCU8AC8AUCCACAAAAACUUCU86UU6C&CUCAACCA6C8AAUU6UCAU4/AAC88Ur^8ACUAACA86AACACCAAgACCAUC 

teS0MyLLPQ}lAQ6FS«(MAKCnK{E}DtDNEKML6TRCn«(LTY8C 
1200  CAAAAUUACCUUCU6CCjACCAUy6CACAA686UUCA8yAAAUG6<%cyAA66AdC6CAA^AA6AUCUU6AyAAC8ACAAAAU8CU688£AC£A8A0AgCGCAA8CUUAC6UAU8CCgGC 

42lL»AFRTKKVMSFyPPP8T0T]|}yKVPASPSAFPMSSV«TTS 
1920  UU«U860C«UUUC8CACUAAeAAA8UgjAgUC6UU£UAUC6CCCACCUG8AACCCA6ACC£6C6UAAAA0UCCCA8CCUCUUUOA8C|CUUU£CCCAU&UC4UCCCUAU08ACBACCUCA 

40ILPP5LR8lcjgFtAL0PKKCCFLL0yJBCELVPCAKAAFt0AQ 
1440  UUOCCCAU8UCGCOOA88CA8AA0UGAAAyU66CAUU£CAACCAAA8AA8eAC8AAAAACU8CUGCA68UCCc86A6GAAUUA6UCAUO8A66CCAA8GCLI8CUUUSOAG6AU6COCAC 

901  EEAPAEKLPfALPPLVAOKlSlOAAACVVCEyEGLOAOlGA 
1900  8A8GAA6CCA6A8C66A6AA6CUCC6A6AA6CACU£CCACCAUUA6UGGCA8ACAA^CAUC6A6GCAGCCGC|^GAA8UVGUCUGC6AAGU6GA66G8CUCCAG6C86ACAUgGGAGCA 

rnsP2  ... 

LVETPRGNVRlIPOAMORMIGOYlVVSPfTlOVtKNAKLAP 
tOOO  6CAyU£6UgGAAACCCC6CGC6GUCAy2UAAe6AUAAUACCUCAAGCAAAUGACCGUAUGAUC6CACAGUAt;AUCGU£60yuCACCAT£CUCU8UGCU8AAGAAgGCCA*ACUC6CACCA 

41AMPLA08yKllTHS6»[AlCPyAVEPY0AAVLPPA65lAVPyp 
1000  eCACACCCGCUA6CAGAgCAG6UUAA6^UCAUAAC£CACUCCG6AA6A^AG6AA6G^AyGCAGUCGAACCyUAC6ACGCUAAA8UA|uGAUGCCA6CAG0A^UGCC6UACCAUGGCCA 

OlEFLALSESATLVVNCACFyNRKtVHIAMHGPAANTECEQY 
1020  8AAUUC^A6CACUGAGUGAGA6gGCCAC6£UA6U6UACAAC6AAA6A6A6UUU6UGAAyC6CAAgCUAUACCAgAUUGCCAI.»GCA£J6CCCCGCyAAGAAUACAGAA6AG6AGCA6UAC 

121KVTKACLAETCYVFOV0«CKRCyKKEEAS6LyL$GELTMPP 
2040  AAG6WACAAAG6CAGA6CU£6CA6AAACA6A6UAC6UGUU£6AC8aGGACAAGAAGCGAU«CGUUAAGAAGGAAGA*GCCUCAGBACU£6UCCUCUC66GA6AACU6ACCAACCC|CCC 

lOtVMEWALEGtKTRPAVPYFVCTIBVZOtPGSGKSAllKSTV 
2100  UAUCAC6A6CUA6CUCuyGA6G6ACUGAAGACUC6ACCC6CB6UCCCGUACAAGGUCjAAACAAUAG6AGUGAUAGSCACACC£GGAUCG66CAAGUCA6CUAUCAa{^AAGUCAACUGUC 

201TAflOl.VTSGKKENCRElEAOVI.RLRGMOITSKTVO$VM£M 
2200  ACG6CAC8y6A£CUUeu£ACCA6C86AAA6AAAGAAAA£uG^CGC6AAA(iU6A6GCC|AC6UGCUAAGftCUGA86BG£AUGCA6AUCACGUCGAAGACA6u£GAUUCGGUUAUGCUCAAC 

241GCHKAVEVtYy0CAFACHAGALLAL2AlyRPRlCKVVLCG0 
2400  0GAUGCCACAAAGCCGUAGAAGUGCUGyAy0Ul>6AC0AA6C6gUCGCGU«CCAC6CA|eAGC§CU2CUUGCCUU8AUUGC^AUCGUCAG^CCCC6CAA6AAGGUA6UACUAl«y6G^GAC 

2eiP(BOCGFFRRROLKVHFNHPE0gDICT«TFyFfgjSRRCTOPv 
2020  CCijAj|0CAAUGCGGAUUCijgCAACAUG|U0CAACUAAA6GUACAUUgCAAyCAgCCUGAATfiA0AUAUAUByACCAAGACAUUCUACAAnJi£AUCUCCCGACGUU6CACACAGCCA9U£ 

32irAXV$TLHYD8KHK7TNPCKKNXEX0ITGATKPKPG01XL 
2040  ACfiGCAAUUGUAUCGACACU6CAUgAC|AUGGAAAGAUGAAAACCACGAACCCeuGCAAGAAGAAyAU£GAAAUyGAgAUUACA6GG6CCAC|AABCCBAABCeAB0GGAUAUCAUCCU6 

901TCFR6lfV«OUGXOYPGHCyMtAAA50GLTRKGyYAyftGKV 
2700  ACAUGUUUyCGCGGBUGG6UWAA6CAA|UGCAAAUC6ACUAUCCCG6ACAUGA£6UAAU6ACABCC6C6GCCUC£CAABBGCUAACyA6AAAA0BABU6UAU0CCGUCC6GCAAAAA8UC 

401NERjAlLyAXT$EHVMVLLTRT€0RLyWKTL0G0P*«lK8jL|7N 
2000  AAUGAAAACSC£CU6UAC6CGAUCACAyCA6AGCA£6U£AAC6UGUUGCUCACCCGCACUGAG6ACAG^CUAGUGUG6AAAACCUU6CA£BBCBACCCAUB6AUUAA&CA(^CACUAAy 

4A10PlCGNFOATIEOyEAEHKGXlAAlM8PfIlPRgjNPFSCKTHV 
9000  ^ACCUAAAGGAAAyUUgCA^BCyACyAUyBABGACUGGGAAGCUGAACACAABGGAAUAAUUBCUGCAAUAAACABCCCCgCUCCC^GuSCCAAUCCBUUCAGCUGCAAGACyAACGUU 


aoscmakalepilat 


GIVlTGCOwSELFPO^AOOFPMSaIYA 


9120  UGCUGGGCGAAAeCAggGGAACCGAUAyugGCCACGGCCGGy.  UC6UACUUACCG6Uy6CCA6U86ACCBAycgBUUCCCi|CAG'JUUGCBBAUGACAAACCi|CA£Ur&6CCAUUUAC6CC 

S21  LDVlClKFFGMOUtSGLFSKOSlPLTYHPAOSARPYAHVO 
3240  UUABACGUAAUyueCAUUAAGUUUUUCBByAUGGAyUUGACAAGCBGACUOUUUUCgAAACAGAGCAUCCCfiCUAACOUAyCAUCCyOCCGAUUCABCGAGGCCGBUABCUCAUUGBGAC 

901  nspgtrkvbyokaQaaelsrrfpvfolagagtoldlqtgp 
9900  AACA6CCCA66AACCCGCAA6UAU6eGyACGAUCAC6CCSuUGCyGC06AACUCUCCCGCA0AUOUCCG6UGUUCCAGCUAGCU6BGAAAGGC4CACA£CUU6AUUU0CAGACG66£A&A 

ooitrvisaqhnlvpvnrnlpmalvpeSkefgpgpvJJTkflnqf 
9480  ACCAGA8UUAUCUCueCACAGCAUAA£yuGGUCCCA6UGAACC6CAAUCUyCCUCAC|CCUUAGUCCCyGA6£ACAAG6AGAA^CAAyCC0GCCCGGg(^AAAAAUU:  UGAACCAGUUC 

641  khksvlvvsEC  *EapIk|krxcmiapigxagadknynuaF6 
9600  AAACACCACUC£GUACUUGU66U£UCA6A^6AA*AA«wU6AAGCUCCCC^AAGA6AAUCGAAU0GAUC6CCCCGAUU66CAgA6CCGGU6CA6AUAAGAACUAyAACCU66CgUUCGGG 

OOlFPPOARYOLyFlMlGTKYn^HHFOGCEDHAATLKrLSASA 
9720  UUUCCGCCGCAOGCACGGUACGACCUOGUGUUCAgCAAyAUUGGAACUAAAUACAGAAACCAyCACUUUCAGCAGUCCGAAGACCAUGCGGCGACCUUgAAAACCCUUUCGCGUUCGCCC 

721  tNCLNPGGTLyVRSYGYAORNSEDVVTALARFFVRVSAAP 
9640  CU6AAygGCCUUAACCC£6GAG6CACCCUC8UGGUGAAGUCCUA£G6yuAC6CC6ACC6CAACAGUGAGGACCUA6UCACC6CUCUUGCCAGAAA^UUUeuCA6£6U6L>CU&CAGCGA6y 

701  P{DcySSNTeMYLXFR0v.0N6nTR0FTPHHLMCYlSSvYC6 
9960  CCAGA|UGUGUCUCAAGCAAUACA&AAAU6UACCUGAUUUUCCGACAACUAGACAACAGCCGyAC4C62CAAUUCACCCC6CAyCAUCUGAAUUGCGU6AUUUC6UC£GU6UAC6AG6GU 


rns 

p 


001  TReGyeAUpSYRtRPERXAOCOEEAVVNAANPUGRPGCGV 
4080  ACAAGA8ASGGA6UU0GA6CCGCACCGyCAUACCGCACyAAAAGG6AGAAyAUU6CU|ACU6UCAAGAbGAA8CA0UU0UCAAy6CAyCCAAUCCGCUGGG£AGACCAGGCGAAGGAGUC 

94CRAlYKRIfpJg|SFT08ATETGTAE)E)7ycjM)6RKVXHAVGPDF 
4200  UGCCGUGCCAUCUAUAAACGUMGCCinACAGUUUUACCGAUgCAGCCACAQAGACAGGCACCGCAltAA^GACUGUeuGCq^OAAAGAAAGUGAUACACGCGGUCGGCCCUGAUUUC 


yCjglGRKVXHAVGPDF 

UGU6UGCc]^6AAAGaAAGUGAUACAC6C6GUC6GCCCU6AUUU 


4200  UGCCGUGCCAUCUAUAAACGUMGCCBlACAGUUUUACCGAUgCAGCCACAQAGACAGGCACCGCATAA^GACUGUeuGCCj^OAAAGAAAGUGAUACACGCGGUCGGCCCUGAUUUC 

0mm  mr  0  * 

7.  »KHPe*E*LKLL0«*'fM*V»DLVHEM«IK6VAlPLLSTGl 
4320  C6S44£CAyCC4S4SeC464*BCC£UCA4*UUCCU|Ca4.4C6CCU4CC4ueC«6Ue|C4S4CUU2GU4.4Ue44C*£<4U4UC*4EtjCUSUCeCC4UUCC4CUeCU4UCU4C4GSCAUU 

jj.  yaagkdblevslhclttaloutdaovtivclokk.kebio 

4440  UACGCAGCCCGAA4AGACCSCCUUGAAjuAUCACUU4ACUeCUUGAC4ACCGCGCUgJ4C46AACUaACBCGGACGUAACCAUCUACyGCCUCGAUAAGAAGUGGAAGGAACGAAUCGAC 

lS4A[BL01.lcES¥TELKDtOPEIOOELVtlIMPDSCL«GnKGFST 
4300  BCSl5UGCUCC4ACUeAAGBAGUCUGUA4CASA6CUGAAGCAUG4A64U4UG6AG4UC|4CG4gG4GUU4GU406GAUCeAUCe86AC4GUU6C0UG4AGGGAAGAAAGSGAU0C4G£ACU 

104  TKGKt  ¥SVFEGTKFM0A4K0«4ElKVLFPND0ESME0l.CA 
4080  ACy4AA6GAAA£UUGUAUUCGUACUUCJAAG6CACCAAAUUCCAUCAAGCAGC4AA£JAJ{4UGGCGGAGAUAAAG6UCCUGUUCCCAAAUGACCAGGA«AGCAA£GAAC>ACUGUGUGCC 

234  YJLGETPEA1BEKCPVDHHOSSSPP4TLPCUCMV4MTPER 
4800  UACAUAUUG6aiGAGACCAUGaAA8CF^UCCGCGA*AAiUGCCC6GUyGACC4£4ACCCGUCGUCU4GCCCG!;CAA4AA.:G{UGCCG^SyCUUUCC4UGUAUGCCAUGACGCC£G..AGG 


Figure  3a.  See  legend  on  last  page  of  this  sequence. 

12 


i74 

4920 


vmblasnwvxevt/csstplpkBkiknvokvoctkvvl^n 
6UCCACAeACU£A6A*6CAACAACGUCAAA6AA6UUACA6UAUGCUCCUCCACCCCUCUUCC£AAGUAyAAAAUCAAGAAU6UUCA&A*GGuuC*6uGCACGAAAGUAQuCCuCUiniA*C 
CO«*4kW(J*  domO«A  *  * 

PMTPAF  VPAPKVJ:  £  VPeQPmAPPAOf6)E6  APtmvA  T 

CC6CAyAC£CCCGCAUUCGUUCCCGCCCGUAAGUAqAUAGAAGuGCCAGAACAGCCU|CyGCUCC&CCuGCACAGSjB.CGAG6AGGCCgCC&AA^UGUA6CGACACCf[pACCA^Cui{CA 

ADNTSLOVT01Sl.O*4DOSSe6SLFSSrSOSONSlTE]«om)i* 
GC£GAUAACACCUCGCaUGAUGUCACggACAUCuCACU&GA£AUGGAUGACA6UACC|AAGGCuCACUUUUUUCGAGCUUUA6C6GAyCGGACAAyuCUAUUACJuCUAUC6ACj|[GuuCG 

SSGPSStOftPQVVVAOVHAVOEPAPIPPPPL  KKMAAl  AAA 
UCGUCAGGACCUAGUUCSCU^^yCGAAGGCAGGUGGUGGUGGCUGACGU^CAUGCC^UCCAAGAGCCUGCCCCUAUUCCACCGCCAAGGCUAAAGAAGAUCCCCCGCCuGGCAGCGGCA 

111  K  ff  pEppasEss  U  S  L  m  l  S  F  G  fi  V  s  M  sEg  sPrOD  G  fJ3A 

A6yAAA^CGCAGGAApAGCCjtT^UCCA^CGGCAAGyA£UAGCUCtf5CGGACpA6UCCCOCCACCUUUCUUUUG6UGG66UAUCCAU6UCCBuCGGAUCAyiiJSuCGACGGA6A<fay6GCC 

^JI3**iil0^jEL*^®^'^0''^^SF6SFS06El[g)ELSP«VTe5EPV 
CGCyyGGCAGCGgtGCAACCCCCGGCyACAGGCCCyACGGAUGuGCCUAUGUCUUUCgGAUCGuumjCCCACGGAOAGAUUCAgGAGgUGACCCGCAGAGUAACCGAGuCgGAACCCGuC 

lfgsfcpgevnsi  isspsavsfplpkoprpppspp  V  op  L 

CUGUUJGGSUCAUUUGAACCj&GGCGAAGUGAACUCAAUgAUAUCGUCCCGAUCAGCCgUAUCUUUUCCACUACGCAAGCAGAGACGUACACGCAGGAGCAGGAGGACgGAAUACuCACuA 

rnsP4 

IFSTOTOPGHLOQKSVLOMOLTePTLEPNVLEPZ 
ACCGGGGUA6GUGGGUACAUAUUUUC6ACGGACACAGGCCCUG6GCACUUGCAAlyGAAGUCC6UUCUGCAGAACCAGCUUACAGAACC6ACCUUCGAGCGCAAuGuyCUGGAAACAAV£ 

QAPVLOTSKEEOLKLRVOMMPTCANKSRrOSRKvENQKA! 
yAgGCCCCGGUGCUCGACACGUCGAAAGAGGAACAgCUCAAACUCAGGUACCAGAUGAUGCCCACCGAAGCCAAyAAAAGfAGGJACCAGUCUCG^AAAGUAGAAAAUCA^AAAGCyAUA 

TTERLLS6LRLVNSAT00P£CYKlTVPAP(5)vSSSv[gANt5 
ACCACUGAGCGAyUgCU£UCAGGyCUACGACUaUAUAACUCUGCCACA6AUCA6CCA6AA06CUAUAA6AUCACCUAgCCGAAACCA^GUAyUCCA6gACyGUAgCAGCyAACUACUCC 

OP[i]FAVAVCNNyLHENVPTVASY01TDEVOAV|.DMVOGTV 
GACCCAAAGUUy6CU6UAGCy6UyU6£AACAACUAUCUGCAU6A6AACUAyCC&ACy|UAGCAUCUUAUCA6AUUACy6AC6AGUAC6AU6CUUAUUUCGAUAUG&UACAyGGGACACuC 

ACLDTATFCPAKLPSYPK  HEYRAPNIRSAVPSANOnTlQ 

GCyUGCCUGGAUACUGCAACCUUyUGCCCCGC^AAGCUUAGgAGUUACCCGAAAi^ACAyGAGUAyAGAGCCCCGAA^AUCCGCAGU^CGGUUCCAUCAGCCAUGCAGAACACGUUGCAA 

nvltaatkrncnvtompclpyldsatfnvecfrkvacmde 

AAUGUGCUCAUUGCCGCyACUAAAAGAAAyUGCAACGUCACGCA^AUGCCSGAAyUG^CAAC&CUGGACUCAGCGACAUUCAAgGUCGAAUGCUUUC&AAAAUAgGCAliGgAA^CACGAG 

YKEEFARKPlRITYEFVTAYVARLKCPKAAALFAKTjgNLV 
UAUU6GGA66A&UUCGCUCG£AAGCCAAUUA6GAUUACyACU6AGUUyGUCACCGCAUAyGUyGC£A6ACUCAAA6GCCCUAAGGCCGCC6CACUgUUUGCAAACAC<^AUAAUUtlG6UC 

PLGEVPMORFVMOMKPOVKVtPGTtCHTEEOPKVOVXOAAE 
ccauugcaagaagugccuauggauagauucgucauggacaugaaaagagacgugaaaguuacaccaggcacgaaacacacagaagaaagaccgaaaguacaagugauacaagccgcagaa 

•  •  » 

318  PLATAYLCGlHPELVARLTAVLLPNIHrLFOMSAEOFOAI 
6720  CCCCU66CGACUGCUUACgUAU6CG6GAU£CACCG6GAgUUAGUGCGyA66CaUACAGCCGUCU06^ACC£AACAUUCA'.AC6CUUUUUGACAU&UCG6C66AGCA£uuU6AUGCAAUC 

356  TAEHFKQGOPVLETOIASFOKSODDAMAtTGLMlLEOLGv 
6640  AUAGCAGAACACUUCAAGCAAGGyCACgCGGUACUGGAGACGGAUAUCGCgUCgUUCGACAAAAGCCAAGACGACGCUAU'GCP'IOAACyGGyCUGAUGAUCUUCGAGGACCUGGGUGUG 

396  OQPLLOLZECAFGEISSTHLPTGTRFKFGAMMKSGMFlTL 
6960  6A£CAACCACUACUCGACUU6AUC6AG|^6C6CCUU£66y6AAAUAUCAUCCACCCAU^yCC£AC666UACCCGUUUyAAAUUCG662CGAUGAUGAAAUCCGGAAUCUi;CCUCACyCUU 

436  FVNTVLNVVlASRVL£ERt.KTS(x]CAAFX600NXlHGyvS0 
7000  UUUGUCAAyACAGU(«GU6AAU0UyeUfiAUCGCCAGCAGA6uyyUg6AAGAGC66CU£AAAACGUCCX^AU0UGCA6C4liUCAUy09C|AC6ACAACAUCAUACAC6GA6UAGUAUCU6AC 

476  KEMAERCATMLNMEVKII04V16ERPPYFC6GF1LQDSVT 
7200  AAAGAAAUGGCUGAGAGGUGyGCCACCUGGCUCAACAUGGA^GUUAAGAUCAUyGACGCAGUCAUCGGyGACAGACCACCyUACWCUCCGeCGGAUUgAJCUUGCAAGAgUCGCUUACg 

916  stacrvaopukrlpklgkplpadoeoocdrrralloetka 
7320  UCCACA6C6U6yC6C6U66Cy6AyCCCyUGAAAA6GCUGUUUAAGUUG66UAAACC6£UCCCA6CC&AC6AC6AGCAAGAC9AA6ACA0AA6AC6C6CCCU6CUA6AUGAAACgAAGGCG 

558  4FRVGIT(5)TLAVAVfIlTRvevOMlTPVttALRTFAOSt(RAF 
7440  U661Aft/AGA6UA66UAUAACA^CACyC/UA6CA6U6GCC6U6Sc64CyCG6U4UGA66UAGAyAA24UVACACCU9UCCU|eU96C4UlK»4d4ACVUUC;6CCCA6A6CA4A£646C4V;>C 

•  •  • 

996  QAlRGElKHLYGCPKiPi  ^MRGFFNMt 

7960  CAA6CCAU^A6AG6y6A4AUAAAGCAUCUCUAC6GUGGUCCUAAAUA6UCA6CAUA6yACAUUUCAUCU9ACUAAUAC£ACAACACCgCCACCAU6AAUA9AGG|UUCUUUAACAUGCUC 

iOSRnPFPAPTAPIHRPRRAROAAPMPARWBLASOXOOLTTAV 
7660  66CC6CC0CCCCUllCCC96CCCCCACU6CCAU6o66A86CC6C66A9AA96A66CA9|C66€CCC6AU6CCU6€CC9CAAy666CU6{CUUC£CAAAUyCAGCAACu6ACCACAGCC6uC 

90SAt.VX60ATRFOS|PRPRPPPRaKKOAPKOPPKRKKPK[g)OE 
7600  A6U6CCCUA6UCAUUe6ACA66CAACUA6ACCUCAA^CCCAC60CCACGCCC6CCgCC6C6CCAGAA6AA6CA66CBCC^AA6CAAjCACC6AA6CC6AA6A*ACCAAAA^C£CA6GAG 

00XKKKgPAKE*^P6KR0RMALKLEA0flLFDVKNED60VICHA 
7920  AA6AA6AA6AA6CAACC£6CAAAA^CAAACCC66AAA6A0ACA6CGyAU6GCACUUAA6UU96A80CC6ACAGAIA«0VCGACeuCAAAAAy6A99ACS6AGAU6UCA0C66CCACGCA 

l30LARE6XVMKPLMVKGTlDMPVtSKLKFTKSSAYDREFA0L 
0040  Cl«6CCAU66AA66AAAG€UAAU0AAACCACU£CAC6UGAAA66AACyAUyGACCACCCU6U9CUAUCAAAGCUCAAAUU£ACCA49^GUCA6CAUACGACAUGGA6UUC6CACA6UUG 
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355 
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395 
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999 
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7996 
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7676 


46 
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S29 
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6)99 


PVNMRSCAFTYTSEHPEGFYNMHHGAVOYSGGRFTIPRGv  209 

CC§6UCAACAUBA6gA6U8AG6C9WCACCUACACCAG£6AgCACCCy9*AGGAUUCUAUAACu66CACCACG6A6C66UBCA6UAUAG£96AG6UA6AUVUACCAUCCC£CG£GGAGUA  B27B 

B6R6O$6RP1m0nS6RVVAXvlGGAOE6TRTAL5VVTmnS  249 

66A66CAG«66A6AyA6C6GAC6UCCGAUCAU6GAUAACUC£GGUC66GUUGUC6C6AUA6UCCUCGB£GGSGCUGAUGAgBGAACAC6AACll6CCCUUUCGBUC6UCACCUG6AAUAG£  8399 

r»E3  *  * 

K8KTIKTTPEBTEe»b*»PLVTAMCI.LGNVSFPClH|BP>>’C  E5 

**«B6SAAeAC<AU£A*eACtACUCCGGAAGGGACASAAGAGUGGUCyGCAGCACCyCUGE  ;AC6GC£AU&U6£UUGCUyGGAAA£JUGAGCUUCCC  AUCC  aaCCGCCCGCCCACAugC^^GS  1 G 

yTBEPSPAUOlLEENVNHEAYOTtLNAILPCGSSGPSAB  1 

UASACCCGCGAACCAUCCAGAGCyCUCGACAUCCU£GAAGAGAACGUGAACCACGABGCCUACGAUACCCUGCUCAA£GCCAUAUU6CGGUGCGEAUCGUCyGGCAGAAGCAAAAGAAGC  8G3G 

yH]DDfTLTSPYLGTCSVCMHT(i)PCFSPjIJ«IEO»>IDEADON  At 

GUC^U0ACSACUtJUACCCU6ACCAGCCCjUACUU6GGCACAUGCUCGUACUGyCACCAUACuSAACCGU6CUUyAGCCC^l>AA6JUyGAGCAGGUCUGGGAyGAAGCGGACGA£AAC  87SB 

.9  ®  S..G.  A.  A.  s.j|.f...k’.y..5„y._m,_g.|^^ 
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^OOMTVKEGTMD 

gagcaggaucauaccguuaaagaaggcaccauggau 


6) 

6679 


121 

8999 


ACCAUACBCAUyCABACUUCCBCCCABUUUGBAUACGAC^AAAGCBGASCAGCAAGCACAAAyAABUACCGCUACAUSUCAUU 

OIKISTSGPCRBCSVKGVFLLAXCPPBDSVTVSiroSSNSA 
tACAUCAA6AU£AGCACCUCAG6ACC6UGUA6AA6GCUUAGCUACAAAGGAUACUUUCUCCUCCC6AAAU6CCCUCCAGG60ACAGCGU6ACGGUUAGCAUAGCGAGUA6CAACUCAGCA 

TSCTglARXIXPXFyBREKVCLPPVMGKKIPCTVYORLFCT  J«1 

AC6UCAII6fiACSS06GCCCGCAA6AUAAAACCAAAAUUC6U6G6ACG6GAAAAAU4yGA£CUACCUCCCGUUCACG6UAAgAA£AUUCCUUOCACA6U6UAyGACC6UCU&AAAGAAACg  9119 

TAGYlTMHRPgJPHAYTSYLEESSGKVYAKPPSGKNITVEC  201 

AC£GC£GGCUAyAUCACUAUGCACA&GCC£SCACC6CAy6C£UA£ACAUCCUACCU66AGGAAUCAUCAG6GAAAGUUUAC9CgAA6gC^CCAUC£GG6AAGAACAUUACGUACGAGUGC  9239 

xcgoyxtgtvStrieitgctaikqcvayksoqtkwvfnsp  241 

AAGUGCGGC6AyUACAAGACCG6AACC^UU£CGACCC6yACy6AAAUCACU«G£UGCACC6CC4UyAAGCA6U6CGUC6CCUAUAAGA6CGACCAgAC6AAGU6GGUCUUyAACUCgCCG  9359 

OLlRMEOHTAOGKCHLPFXLIPSTCMVPYAMAPNyiHGFH 
GACUU6AUCA9gCAU^C6AyCACAC6gCCCAA6S6AAAUtJGCAUUUGCCUUUCAA6UU6AUCCCGA6UACCUGCAUG6UCCCUGuUJCCCACGCCCC6AA£CUAAUACA£GGCUU£AAA 

HISLOLOTOHLTLLYTRRLCAWPEPYTERxBgxTVRNFIV 
CACAUCAOyClfCCAAUUAGAgACAOACgACyuGACAgUGCUCACCACCAGGAGACUAGGGGCAAACCCGGAACCAACCACUGAAUGG^UCi^Uy&GAAAGACGGUyAGAAACUUCACCGUC 

OROGLEVIWGNHEPVRVYAQESAPGDPHBWPhEXVQhyyn  361 

6ACC6AGAU6GCCU66AgUACAUAU6GG6£AAUCA£GAACCA6UAAGGGJCUAUGCCCAA6AGUCAGCACCAG6AGACCCUCAC66A^G6CCACAC6AAAUAGUACAGCAUUACUACCAU  9719 

RHPVYTXLAVASAtSvAMMlGVTVA0t.CACKAPRECLTPyA  40] 

C6CCAUCCUGU6UACACCAlJCUUAGCCGUCGCAUCABCUScUGU6GC6AU6AUGAUUG6CGUAACUGUUGCAir£^UUAU6D6CCUGUAAAGCGCGCCGU6A6U0CCUGACGCCAU*gGCC  9639 


281 

94  79 


32) 

9599 


Figure  3b.  See  legend  on  the  last  page  of  this  sequence. 
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r^6K 

402tAPNAVIPTStAtUCCVRSAM*fcTrteT»lSr©»SNS0PrF 
0640  CU66CCCCAAAyCCC6UAAUUCCAACWC6CU66CACUCUU6U6CU0y6UUA66UC6|C^AAC6CUSAAAC6UUCACC6ASACCAUCAGUUA^U£UGGUCGAACA&C£A&CCCUUCUuC 

lOOVOLCIPtAAfBlVLMflCCSCCLPfLVVAGAVLAKVDA  {T**  E  M 
9960  U666UCCA6£U6U6CAUACC£UUC6CC2CUSOCAUC6U£CUAAU6C6CU6yU6CUCCy6CU0CCU6CCUWlHIUAOUO6UU6CCGGCGCCUACCU06C6AA6CU*GACGCCUACGAACAu 
4ATTVPNVP0IPVKALVERA6VAPUWLeJTVMSSeVLPSt  N 
iooao  9C6ACCACll«UUCCAAAUeU0CCACA6AUACC6UAUAA00CACUU0UU6AAAG00CAfi66UA£6CCCC9CUCAAU£U60A6AUyACU(gUCAU6UCCUC66A&GOUUUACCUUCCACCAAC 

44aeYITCKFTTVVPSPK(3itCC6SL6C0PAAMA0'»TCKVF06 
toaoo  CAA6AWACAUfiACyUOyAAAOUCACy^C56UOOUCCCCOCCCCyAAA^CAAAU6C^C96CUCCUOO«AAU6UCA«CCy6CCCCUCA£GCAGACUAUACCuGCAA6CUCUUyG&A&6G 

O4VVPFM«66AQCFC0SENS0MSCAVyELSADCA(T30HAOAlK 
10320  6tl£UACCCCUU£AUfiU6866AO6A6C^AAU6UUlAMOC6ACA6U6A6AACA6CCA6AU6A6U6A06C0UACOUC0AAUUGUCA0CA9AUU6CGC6^UGACCAC6CGCAGGC6AUUAAC 

124  VHTAAPKVOLRIVYOMTTSFLOVVyXGyTPGTSKOLAVl* 
10440  MACAyACU6CC6C6AU6AAA6UAG6ACU6C6UAUA6U6UACOGOAACACUACCA6U^UCCUA6AU6U6UACGOGAAC66ACUyACACCAGGAACGUCUAAAGAC£UGAAAGUCAUA&CU 

164  6PlSASFTPFOHKyVlMfl6LVV»trOFPEVSA*IKP60»'6Ol 
10660  66AeCjAUUUCASCAUC6UUUAC£CCAUUCfiAUCA^AASCUCGUUAUCCAUCGCGGCCU06U6UACAACUAu0ACUUCCC66AAUAg|GAGCGAy6AAACCAG6Agy6UUU6GAGACAUU 

204  0ATSLTSK0LlAS70XRLLKPSAKNVMVPYT0AfA}SGFeM« 
10660  CAA6CUACCUCCUU6ACUA8CAA^6A£CUCAUC6CCAOCACA6ACAUUA66CUACUy|^AGCCUVCCGCCAA6AACOUGCAU«UCCC6^ACACOCAGGCC^AUCAG6AUU£GAGAUGuGG 

244  KNNSBRPtOetAPFGCFlAVNPARAVDCSYCNlPlSlDlP 
10600  AAAAACAACUCAG6CC6CCCACU6CA6|AAACC6C£CCUUUC666U6UAA6AU£6CA2U£AAUCC6CUyCGAGC66UG6ACU6£UCAyAC6G6AACAUUCCCAUyuCUAUUGACAUCCCG 

264  HAAFIRTSOAPLVSTVRCfSvSECTYSAOFGGRATLOtySO 
10990  AAC6CU6CCUUUAUCA66ACAUCA6AU|CACCACU6GUCUCAACAGUCAAAU6J8AyGUCACUGAGU6CACUUAOUCA6CA6ACUOC|GCGGOAU8GCCACCCU6CA6UAU6UAUCCGAC 

324  RE60CPVHSMSSTATLOEGTVHVLEXGAVTVHFSTASP0A 
11040  C8C6AAG6ACAAU6CCCy6UACAUUC6JAyUC^A6CACA6CAAC£CUCCAA6AGUC6^CAGUyCAU6UCCU6GAGAAA66A6C66U6^CA6UACACUU£AGCACCGCGACUCCACA6GCC 

364  mfjwslcgxkttcnaecxppaohivstpmknooefoaais 

11160  AACUU£AUy6UAUC6CU6U6£66yAA6AA8ACAACAU6CAAU6CAGAAU6UAAACCAjCA6CUGACCAUAUC6llGAGCACCCC6CAC|AAAAU6ACCAAGAAUU£CAA6CCCCCAUCUCA 

404  K7SMSNL  FALFGGASSLL  1  IGlQiF  ACSMML  T  STRROP 
11260  AAAACAUCAU6GAGUU66CUGUUUGCCCUUUUCG6CG6C6CCUC6UCGCUAUUAAllUAUAGGACUUA£GAUUUUy0CUU6CA6CAU6AU6CUGACUAGCACAC6AACAUGACCGCUACGC 

11400  CCCAAUGA£CCGACCAGCAAAACUCGAUGUACUUCCGAGGAACUGAU0UGCAUAAUagAUCA66CU6GUAUAUUAGAUCCCCCCUUA£UGCGGGCAAUAUAGCAACAC(;AAAACLlCGA£G 

11920  UACUUCCBAGGAAGCGCAGUGCAUAAUjCUGCOCA&USUUGCCACAUUAUCACUAUA^UAACCAUUUAUUUASC&GACGCCgAAACUjAAUGUAUUUCUGAGCAAGC^UCGUGCAUAAUG 

11640  CCAyGCAGC6UCU6CA£AAyUUUl^U^AUUUCUUUUAUUAAUCAACAAAAUUUUGU^UUUAACAUUUC 


to 

9859 


3 

100?9 


43 

10199 


03 

10319 


123 

10439 


163 

10559 


203 

10679 


243 

10799 


203 

10919 


323 

11039 


303 

11159 


403 

11279 


11399 

11519 

11839 

11708 


Figux«3c.  Complete  nucleotide  sequence  of  the  Ockelbo  virus  genome.  The  sequence  is  shovm  from 
5’  to  3'  and  translated  using  the  single  letter  amino  acid  code.  Nucleotides  different  from  those  in 
HRSP  are  underlined,  and  changed  amino  acids  are  boxed.  Deletions  relative  to  HR  are  indicated 
solid  triangles  pointing  upward  and  the  number  of  residues  deleted.  Insertions  have  both  amino 
acids  and  nucleotides  boxed  together,  and  an  open  triangle  pointing  downward.  Termination  codons 
are  labelled  Am  (Amber,  UAG)  or  Op  (Opal,  UGA)  w  appropriate.  Nudwtides  are  numbered  5  to 
3';  »TniT>n  acid  numbering  begins  again  at  the  beginning  of  each  final  protein  product. 
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nucleotide  differences  are  silent,  illustrating  the  importance  of  conservation  of 
amino  add  sequence. 

Glycoprotein  E2  is  particularly  important  for  antigenidty,  as  described 
above,  and  changes  in  E2  have  been  assodated  with  changes  in  virulence  (Lustig 
et  al.,  1988;  Olmsted  et  al.,  1986;  Strauss  et  al.,  1991;  Tucker  and  Griffin,  1991). 
The  differences  in  glycoprotein  E2  between  six  strains  of  Sindbis  virus  are  listed  in 
Fig.  4.  The  residues  at  positions  172,  209,  212,  and  216  are  known  to  be  important 
determinants  of  the  antigenidty  of  the  virus  (Strauss  et  al.,  1991),  and  the  changes 
in  these  positions  are  important  for  the  differences  in  the  cross- re  activity  of  the 
viruses  with  antibodies.  The  residues  at  55  and  172  are  known  to  be  important 
determinants  of  the  neurovirulence  of  the  virus  in  a  mouse  model  (Lustig  et  al., 
1988),  and  it  is  possible  that  the  amino  add  difference  at  position  55  may  be 
important  for  the  increased  virulence  of  Ockelbo  virus  compared  to  the  other 
strains  of  Sindbis  virus  in  Fig.  4. 

ANALYSIS  OF  3*  TERMINAL  NONTRANSLATED  SEQUENCE 

To  study  the  relationships  among  a  number  of  Sindbis  viruses  present  in 
nature,  the  sequences  of  the  3’  nontranslated  regions  (NTR)  were  obtained  for  a 
number  of  strains.  These  sequences  are  shown  in  Fig.  5.  The  sequence  identity 
throughout  this  region  is  greater  than  80%  for  all  viruses  shown,  and  the 
sequence  organization  is  identical  except  for  a  few  scattered  insertions  and 
deletions.  In  the  3’  NTR  there  are  three  repeated  elements  that  are  highly 
conserved  (boxed  in  the  figure).  As  an  example  of  the  conservation  of  these 
elements,  there  are  49  differences  in  the  3’  NTRs  of  the  Australian  and  AR339 
strains  that  occur  outside  the  repeated  elements  (24.1%  divergence)  but  only  7 
changes  within  these  elements  (5.8%  divergence),  and  the  overall  divergence  is 
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Figure  4.  Amino  acid  differences  in  the  glycoprotein  E2  of  various  Sindbis  strains.  The 
sequence  of  HRSP  is  from  Strauss  et  al.  (1984);  The  sequence  marked  DG  is  the  SVIA 
strain  published  in  Lustig  et  al.(1988).  AS  is  our  unpublished  sequence  of  the  strain  used 
by  A.  Schmaljohn  for  the  isolation  of  antigenic  escape  mutants  (Stec.  et  al.,  1986);  RJ  is 
the  sequence  from  Davis  et  al.  (1986)  of  a  laboratory  strain  from  Robert  Johnston.  The 
sequence  of  AR86  was  reported  in  Russell  et  al.  (1989),  and  the  Ockelbo  sequence  was 
presented  in  Figure  3. 
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Figure  6.  Sequence  of  the  3*  termini  of  several  Sindbis  viruses.  The  sequences  of  Ockelbo  83M107,  Karelian 
fever,  and  South  African  Sindbis  (Girdwood)  were  determined  from  cloned  cDNA.  Those  of  the  Indian  A1036 
and  Australian  18620  isolates  were  determined  directly  from  RNA  by  dideoxy  sequencing  using  reverse 
transcriptase  and  a  T12GA  primer.  The  Ockelbo  82  sequence  is  from  Pig.  3  and  that  of  AR339  (HRSP)  is 
from  Strauss  et  al.  (1984).  Inree  repeated  sequence  elements  of  40  nucleotides  are  boxed.  The  translated 
sequence  is  for  A]^39  (MRSP)  and  any  amino  acid  that  differs  in  the  other  viruses  is  boxed.  This  figure  is 
from  Shirako  et  al.  (1991). 
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18.1%.  From  such  analysis,  we  propose  that  these  repeated  and  conserved 
elements  play  an  important  role  in  viral  RNA  replication,  and  this  role  is  probably 
more  important  in  mosquito  cells  than  in  vertebrate  cells  (Kuhn  et  al.,  1990). 

The  relationships  among  these  viruses  is  illustrated  in  Fig.  6.  Three  points 
are  obvious  from  this  diagram.  One  is  that  the  Sindbis  strains  analyzed  can  be 
divided  into  a  European-African  group  and  an  Asian-Australian  group.  The 
second  point  is  that  Ockelbo  virus  and  Karelian  fever  virus  are  virtually  identical. 
The  third  point  is  that  Ockelbo  virus  is  more  closely  related  to  the  South  African 
strain  of  Sindbis  virus  isolated  in  1963  (and  which  is  also  capable  of  causing 
human  illness)  than  it  is  to  the  Egyptian  strain  isolated  in  1952.  We  conclude 
from  this  last  point  that  Ockelbo  virus  was  probably  introduced  into  Northern 
Sweden  from  South  Africa  in  the  19608,  from  where  it  spread  into  Finland  (where 
it  causes  the  disease  called  Pogosta)  and  the  Karelian  region  of  Russia. 

SEQUENCE  STTUDIES  OF  AURA  RNA 

We  have  obtained  the  sequence  of  essentially  all  of  the  genome  of  Aura  virus 
and  are  currently  assembling  this  sequence.  We  were  particularly  interested  in 
this  virus  because  we  have  previously  shovm  that  Western  equine  encephalitis 
virus  (WEE),  previously  thought  to  be  closely  related  to  Sindbis  virus,  is  in  fact  a 
recombinant  virus  in  which  most  of  the  genome  was  derived  from  Eastern  equine 
encephalitis  virus  and  only  the  surface  glycoproteins  were  derived  from  a  Sindbis- 
like  virus  (Hahn  et  al.,  1988).  Thus  the  question  arose  as  to  whether  there  is  a 
virus  found  in  the  Americas  that  is  closely  related  to  Sindbis  and  that  could  have 
served  as  the  second  parent  of  WEE.  The  question  is  of  particular  interest  because 
WEE  emerged  from  a  recombination  event. 
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The  sequence  of  about  5000  nucleotides  of  Aura  RNA  in  the  nonstructural 
protein  coding  region  is  shown  in  Fig.  7.  This  sequence  begins  in  the  5’  NTR  and 
continues  through  nsPl,  nsP2,  and  part  of  nsPS.  From  this  sequence,  it  is  clear 
that  Aura  virus  is  closely  related  to  Sindbis  virus.  Comparison  of  the  amino  acid 
sequences  of  Sindbis  virus  and  of  Aura  virus  in  the  region  represented  by  the 
Aura  sequence  in  Fig.  7  shows  that  the  two  sequences  are  80%  identical, 
illustrating  that  Aura  is  in  fact  a  Sindbis-like  virus.  We  also  found  that  the  3' 
NTR  of  Aura  RNA  is  Sindbis-like.  As  described  above,  Sindbis-like  viruses 
contain  three  copies  of  a  conserved  sequence  element  that  we  postulate  is 
important  for  RNA  replication.  Although  other  alphaviruses  often  contain 
repeated  sequence  elements,  these  elements  are  completely  different  in  sequence 
from  the  Sindbis  sequence.  Furthermore,  WEE  lacks  the  characteristic  Sindbis  3' 
NTR,  and  contains  instead  a  chimeric  3’  NTR.  Thus  Aura  virus  represents  the 
first  known  example  of  a  true  Sindbis-like  virus  in  the  Americas 

Aura  virus  is  widely  distributed  in  South  America,  having  been  isolated  in 
Brazil  and  in  Northern  Argentina.  Analysis  of  the  data  is  not  yet  complete,  but  it 
is  possible  that  Aura  virus  represents  the  ancestral  Sindbis-like  virus,  and  that  it 
was  transmitted  to  the  Old  World  to  serve  as  the  founder  of  the  Sindbis  viruses  in 
the  Old  World,  as  we  previously  postulated  (Levinson  et  al.,  1990).  Aura  virus 
may  have  served  as  one  of  the  parents  of  WEE,  contributing  its  glycoproteins  to 
this  recombinant  virus  (Hahn  et  al.,  1988). 
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GCA 

TCC 

CGA 

CTC 

GCA 

GAC 

AAA 

GCA 

GGG 

GAA 

ATT 

ACC 

AAC 

AAG 

AGG 

CTG 

CAT 

GAT 

AAA 

CTT 

ala 

ser 

arg 

leu 

ala 

asp 

lys 

ala 

gly 

glu 

ile 

thr 

asn 

lys 

arg 

leu 

his 

asp 

lys 

leu 

421/119 

GCA 

GAC 

CTC 

AAG 

TCG 

GTC 

CTC 

GAG 

TCG 

CCG 

GAT 

GCT 

GAA 

ACT 

GGT 

ACC 

ATT 

TGT 

TTC 

CAC 

ala 

asp 

leu 

lys 

ser 

val 

leu 

glu 

ser 

pro 

asp 

ala 

glu 

thr 

gly 

thr 

ile 

cys 

phe 

his 

481/139 

AAT 

GAC 

GTA 

ATA 

TGC 

CGT 

ACG 

ACA 

GCG 

GAG 

GTA 

TCA 

GTT 

ATG 

CAA 

AAT 

GTG 

TAT 

ATC 

AAT 

asn 

asp 

val 

ile 

cys 

arg 

thr 

thr 

ala 

glu 

val 

ser 

val 

met 

gin 

asn 

val 

tyr 

ile 

asn 

541/159 

GCA 

CCT 

TCG 

ACC 

ATT 

TAC 

CAT 

CAG 

GCC 

CTA 

AAG 

GGA 

GTC 

AGA 

AAA 

CTG 

TAT 

TGG 

ATC 

GGG 

ala 

pro 

ser 

thr 

ile 

tyr 

his 

gin 

ala 

leu 

lys 

gly 

val 

arg 

lys 

leu 

tyr 

trp 

ile 

gly 

601/179 

TTC 

GAT 

ACA 

ACG 

CAG 

TTT 

ATG 

TTC 

TCC 

TCG 

ATG 

GCA 

GGG 

TCG 

TAT 

CCG 

TCC 

TAC 

AAT 

ACT 

phe 

asp 

thr 

thr  gin 

phe 

met 

phe 

ser 

ser 

met 

ala 

gly 

ser 

tyr 

pro 

ser 

tyr 

asn 

thr 

661/199 

AAT 

TGG 

GCC 

GAT 

GAA 

AGG 

GTG 

CTG 

GAA 

GCG 

CGT 

AAT 

ATA 

GGC 

CTA 

TGT 

AGC 

ACG 

AAG 

CTG 

asn 

trp 

ala 

asp  glu 

arg 

val 

leu 

glu 

ala 

arg 

asn 

ile 

gly 

leu 

cys 

ser 

thr 

lys 

leu 

721/219 

AGA 

GAG 

GGT 

ACG 

ATG 

GGC 

AAA 

CTG 

TCT 

ACC 

TTC 

CGG 

AAA 

AAG 

GCC 

TTG 

AAA 

CCT 

GGA 

ACT 

arg 

glu 

gly 

thr 

met 

gly 

lys 

leu 

ser 

thr 

phe 

arg 

lys 

lys 

ala 

leu 

lys 

pro 

gly 

thr 

781/239 

AAC 

GTG 

TAC 

TTC 

TCT 

GTC 

GGT 

TCG 

ACA 

CTC 

TAC 

CCT 

GAG 

AAT 

AGA 

GCG 

GAC 

CTG 

CAG 

AGT 

asn 

val 

tyr 

phe 

ser 

val 

gly 

ser 

thr 

leu 

tyr 

pro 

glu 

asn 

arg 

ala 

asp 

leu 

gin 

ser 

841/259 

TGG 

CAC 

CTA 

CCA 

TCT 

GTG 

TTC 

CAC 

TTG 

AAA 

GGT 

AAA 

CAA 

TCC 

TTT 

ACG 

TGC 

CGC 

TGT 

GAT 

trp 

his 

leu 

pro 

ser 

val 

phe 

his 

leu 

lys 

gly 

lys 

gin 

ser 

phe 

thr 

cys 

arg 

cys 

asp 

901/279 

ACG 

GCG 

GTT 

AAC 

TGC 

GAA 

GGA 

TAC 

GTA 

GTC 

AAG 

AAG 

ATC 

ACC 

ATC 

AGC 

CCC 

GGG 

ATC 

ACG 

thr 

ala 

val 

asn 

cys 

glu 

gly 

tyr 

val 

val 

lys 

lys 

ile 

thr 

ile 

ser 

pro 

gly 

ile 

thr 

961/299 

GGG 

CGT 

GTC 

AAT 

CGG 

TAC 

ACT 

GTG 

ACT 

AAC 

AAC 

AGC 

GAG 

GGA 

TTC 

TTG 

CTG 

TGT 

AAG 

ATC 

gly 

arg 

val 

asn 

arg 

tyr 

thr 

val 

thr 

asn 

asn 

ser 

glu 

gly 

phe 

leu 

leu 

cys 

lys 

ile 

1021/319 

ACA 

GAT 

ACG 

GTC 

AAA 

GGG 

GAG 

CGT 

GTA 

TCG 

TTC 

CCT 

GTC 

TGT 

ACG 

TAT 

ATT 

CCA 

CCT 

TCA 

thr 

asp 

thr 

val 

lys 

gly 

glu 

arg 

val 

ser 

phe 

pro 

val 

cys 

thr 

tyr 

ile 

pro 

pro 

ser 

1081/339 

ATC 

TGT 

GAC 

CAA 

ATG 

ACA 

GGT 

ATA 

TTG 

GCC 

ACT 

GAT 

ATC 

CAA 

CCC 

GAA 

GAC 

GCG 

CAA 

AAG 

ile 

cys 

asp 

gin 

met 

thr 

gly 

ile 

leu 

ala 

thr 

asp 

ile 

gin 

pro 

glu 

asp 

ala 

gin 

lys 

Figure  7a.  See  legend  on  last  page  of  this  sequence 

21 


1141/359 

TTG 

CTG 

GTA 

GGA 

CTG 

AAC 

CAA 

CGC 

ATA 

GTC 

GTG 

AAC 

GGA 

AAA 

ACT 

AAT 

AGA 

AAC 

ACC 

AAC 

leu 

leu 

/al 

gly 

leu 

asn 

gin 

arg 

ile 

val 

val 

asn 

gly 

lys 

thr 

asn 

arg 

asn 

thr 

asn 

1201/379 

ACG 

ATG 

CAG 

AAC 

TAT 

CTC 

CTG 

CCC 

GCG 

GTG 

GCT 

ACA 

GGT 

CTG 

AGT 

AAA 

TGG 

GCC 

AAA 

GAA 

thr 

met 

gin 

asn 

tyr 

leu 

leu 

pro 

ala 

val 

ala 

thr 

gly 

leu 

ser 

lys 

t  tp 

ala 

lys 

glu 

1261/399 

AGA 

AAG 

GCA 

GAC 

TGC 

AGT 

GAC 

GAG 

AAA 

CCA 

TTG 

AAT 

GTG 

AGA 

GAA 

CGC 

AAA 

CTA 

GCT 

TTC 

arg 

lys 

ala 

asp 

cys 

ser 

asp 

glu 

lys 

pro 

leu 

asn 

val 

arg 

glu 

arg 

lys 

leu 

ala 

phe 

1321/419 

GGT 

TGC 

CTA 

TGG 

GCT 

TTC 

AAG 

ACC 

AAG 

AAG 

ATC 

CAT 

TCT 

TTT 

TAC 

CGC 

CCG 

CCA 

GGC 

ACG 

gly 

cys 

leu 

trp 

ala 

phe 

lys 

thr 

lys 

lys 

ile 

his 

ser 

phe 

tyr 

arg 

pro 

pro 

gly 

thr 

1381/439 

GAG 

ACT 

ATA 

GTA 

AAA 

GTC 

GCA 

GCG 

GAA 

TTC 

AGT 

GCG 

TTC 

CCT 

ATG 

TCC 

TCG 

GTG 

TGG 

ACT 

gin 

thr 

ile 

val 

lys 

val 

ala 

ala 

glu 

phe 

ser 

ala 

phe 

pro 

met 

ser 

ser 

val 

trp 

thr 

1441/459 

ACG 

TCA 

CTG 

CCA 

ATG 

TCA 

CTG 

AGA 

CAG 

AAA 

GTT 

AAA 

CTG 

CTT 

CTT 

GTA 

AAG 

AAA 

ACC 

AAT 

thr 

ser 

leu 

pro 

met 

ser 

leu 

arg 

gin 

lys 

val 

lys 

leu 

leu 

leu 

val 

lys 

lys 

thr 

asn 

1501/479 

AAA 

CCG 

GTA 

GTC 

ACT 

ATT 

ACT 

GAC 

ACT 

GCG 

GTA 

AAA 

AAC 

GCA 

CAA 

GAG 

GCA 

TAT 

AAC 

GAA 

lys 

pro 

val 

val 

thr 

ile 

thr 

asp 

thr 

ala 

val 

lys 

asn 

ala 

gin 

glu 

ala 

tyr 

asn 

glu 

1561/499 

GCC 

GTC 

GAG 

ACA 

GCA 

GAA 

GCG 

GAG 

GAG 

AAA 

GCG 

AAG 

GCC 

TTA 

CCT 

CCG 

CTG 

AAG 

CCG 

ACG 

ala 

val 

glu 

thr 

ala 

glu 

ala 

glu 

glu 

lys 

ala 

lys 

ala 

leu 

pro 

pro 

leu 

lys 

pro 

thr 

1621/519 

GCA 

CCC 

CCT 

GTA 

GCG 

GAG 

GAC 

GTC 

AAA 

TGC 

GAG 

GTC 

ACC 

GAC 

CTG 

GTA 

GAC 

GAT 

GCG 

GGA 

ala 

pro 

pro 

val 

ala 

glu 

asp 

val 

lys 

cys 

glu 

val 

thr 

asp 

leu 

val 

asp 

asp 

ala 

gly 

1681/539 

GCG 

GCC 

CTG 

GTC 

GAG 

ACG 

CCC 

CGG 

GGA 

AAG 

ATA 

AAA 

ATT 

ATC 

CCA 

CAG 

GAA 

GGG 

GAC 

GTG 

ala 

ala 

leu 

val 

glu 

thr 

pro 

arg 

gly 

lys 

ile 

lys 

ile 

ile 

pro 

gin 

glu 

gly 

asp 

val 

1741/559 

CGT 

ATT 

GGT 

TCC 

TAC 

ACA 

GTC 

ATT 

TCT 

CCA 

GCG 

GCA 

GTC 

CTT 

AGA 

AAT 

CAA 

CAA 

CTG 

GAG 

arg 

ile 

gly 

ser 

tyr 

thr 

val 

ile 

ser 

pro 

ala 

ala 

val 

leu 

arg 

asn 

gin 

gin 

leu 

glu 

1801/579 

CCA 

ATC 

CAC 

GAG 

TTA 

GCA 

GAG 

CAG 

GTG 

AAA 

ATT 

ATC 

ACG 

CAC 

GGT 

GGC 

CGA 

ACA 

GGC 

AGG 

pro 

ile 

his 

glu 

leu 

ala 

glu 

gin 

val 

lys 

ile 

ile 

thr 

his 

gly 

gly 

arg 

thr 

gly 

arg 

1861/599 

TAT 

TCC 

GTC 

GAA 

CCT 

TAC 

GAT 

GCT 

AAG 

GTT 

CTC 

CTG 

CCA 

ACA 

GGA 

TGC 

CCC 

ATG 

TCC 

TGG 

tyr 

ser 

val 

glu 

pro 

tyr 

asp 

ala 

lys 

val 

leu 

leu 

pro 

thr 

gly 

cys 

pro 

met 

ser 

trp 

1921/619 

CAA 

CAT 

TTC 

GCG 

GCC 

TTG 

AGC 

GAA 

AGC 

GCT 

ACG 

TTA 

GTC 

TAC 

AAT 

GAG 

AGA 

GAG 

TTC 

CTG 

gin 

his 

phe 

ala 

ala 

leu 

ser 

glu 

ser 

ala 

thr 

leu 

val 

tyr 

asn 

glu 

arg 

glu 

phe 

leu 

1981/639 

AAC 

CGG 

AAA 

CTC 

CAT 

CAC 

ATC 

GCT 

ACG 

AAG 

GGT 

GCG 

GCA 

AAA 

AAC 

ACT 

GAG 

GAA 

GAA 

CAA 

asn 

arg 

lys 

leu 

his 

his 

ile 

ala 

thr 

lys 

gly 

ala 

ala 

lys 

asn 

thr 

glu 

glu 

glu 

gin 

2041/659 

TAC 

AAA 

GTA 

TGC 

AAA 

GCT 

AAA 

GAC 

ACG 

GAT 

CAT 

GAG 

TAC 

GTA 

TAC 

GAC 

GTA 

GAT 

GCC 

AGA 

tyr 

lys 

val 

cys 

lys 

ala 

lys 

asp 

thr 

asp 

his 

glu 

tyr 

val 

tyr 

asp 

val 

asp 

ala 

arg 

2101/679 

AAA 

TGC 

GTA 

AAA 

AGA 

GAG 

CAT 

GCA 

CAA 

GGG 

CTA 

GTA 

CTA 

GTT 

GGG 

GAA 

CTA 

ACT 

AAT 

CCG 

lys 

cys 

val 

lys 

arg 

glu 

his 

ala 

gin 

gly 

leu 

val 

leu 

val 

gly 

glu 

leu 

thr 

asn 

pro 

2161/699 

CCT 

TAC 

CAC 

GAG 

CTG 

GCA 

TAC 

GAA 

GGA 

TTA 

CGT 

ACA 

CGA 

CCC 

GCT 

GCC 

CCT 

TAC 

CAT 

ATC 

pro 

tyr 

his 

glu 

leu 

ala 

tyr 

glu 

gly 

leu 

arg 

thr 

arg 

pro 

ala 

ala 

pro 

tyr 

his 

ile 

Figure  7b.  See  legend  on  last  page  of  this  sequence 
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2221/719 

GAA 

ACA 

CTG 

GGG 

GTC 

ATT 

GGA 

ACA 

CCG 

GGG 

TCA 

GGT 

AAG 

TCG 

GCC 

ATC 

ATA 

AAA 

TCT 

ACG 

glu 

thr 

leu 

gly 

val 

ile 

gly 

thr 

pro 

gly 

ser 

gly 

lys 

ser 

ala 

ile 

ile 

lys 

ser 

thr 

2281/739 

GTA 

ACA 

CTA 

AAA 

GAC 

CTC 

GTA 

ACT 

AGC 

GGT 

AAG 

AAA 

GAA 

AAT 

TGC 

AAA 

GAA 

ATA 

GAG 

AAT 

val 

thr 

leu 

lys 

asp 

leu 

val 

thr 

ser 

gly 

lys 

lys 

glu 

asn 

cys 

lys 

glu 

ile 

glu 

asn 

2341/759 

GAC 

GTC 

CAG 

AAA 

ATG 

CGG 

GGA 

ATG 

ACT 

ATA 

GCT 

ACG 

AGA 

ACG 

GTA 

GAC 

TCG 

GTA 

CTT 

CTT 

asp 

val 

gin 

lys 

met 

arg 

gly 

met 

thr 

ile 

ala 

thr 

arg 

thr 

val 

asp 

ser 

val 

leu 

leu 

2401/779 

AAT 

GGA 

TGG 

AAG 

AAA 

GCA 

GTA 

GAC 

GTC 

CTA 

TAT 

GTG 

GAT 

GAA 

GCG 

TTT 

GCA 

TGT 

CAT 

GCA 

asn 

gly 

trp 

lys 

lys 

ala 

val 

asp 

val 

leu 

tyr 

val 

asp 

glu 

ala 

phe 

ala 

cys 

his 

ala 

2461/799 

GGC 

ACC 

TTA 

ATG 

GCA 

TTG 

ATT 

GCC 

ATT 

GTC 

AAA 

CCG 

AGA 

CGT 

AAA 

GTA 

GTA 

CTG 

TGC 

GGC 

giy 

thr 

leu 

met 

ala 

leu 

ile 

ala 

ile 

val 

lys 

pro 

arg 

arg 

lys 

val 

val 

leu 

cys 

gly 

2521/819 

GAC 

CCG 

AAG 

CAG 

TGG 

CCC 

TTC 

TTT 

AAT 

TTA 

ATG 

CAA 

CTG 

AAG 

GTA 

AAC 

TTC 

AAC 

AAC 

CCC 

asp 

pro 

lys 

gin 

trp 

pro 

phe 

phe 

asn 

leu 

met 

gin 

leu 

lys 

val 

asn 

phe 

asn 

asn 

pro 

2581/839 

GAG 

CGA 

GAC 

CTG 

TGT 

ACT 

TCC 

ACC 

CAT 

TAT 

AAA 

TAT 

ATC 

TCT 

CGC 

AGG 

TGC 

ACC 

CAA 

CCT 

glu 

arg 

asp 

leu 

cys 

thr 

ser 

thr 

his 

tyr 

lys 

tyr 

ile 

ser 

arg 

arg 

cys 

thr 

gin 

pro 

2641/859 

GTT 

ACA 

GCC 

ATA 

GTG 

TCT 

ACA 

TTA 

CAC 

TAT 

GAC 

GGA 

AAG 

ATG 

AGG 

ACT 

ACG 

AAT 

CCC 

TGC 

val 

thr 

ala 

ile 

val 

ser 

thr 

leu 

his 

tyr 

asp 

gly 

lys 

met 

arg 

thr 

thr 

asn 

pro 

cys 

2701/879 

AAA 

AGG 

GCT 

ATC 

GAA 

ATA 

GAC 

GTA 

AAC 

GGA 

TCG 

ACT 

AAG 

CCC 

AAG 

AAA 

GGA 

GAC 

ATA 

GTG 

lys 

arg 

ala 

ile 

glu 

ile 

asp 

val 

asn 

gly 

ser 

thr 

lys 

pro 

lys 

lys 

gly 

asp 

ile 

val 

2761/899 

TTG 

ACG 

TGT 

TTC 

CGT 

GGG 

TGG 

GTT 

AAG 

CAG 

GGG 

CAA 

ATC 

GAT 

TAG 

CCC 

GGA 

CCC 

GGA 

GGT 

leu 

thr 

cys 

phe 

arg 

gly 

trp 

val 

lys 

gin 

gly 

gin 

ile 

asp 

tyr 

pro 

gly 

pro 

gly 

gly 

2821/919 

CAT 

GAC 

CGT 

GCA 

GCT 

TCT 

CAA 

GGG 

CTA 

ACC 

AGA 

AGG 

GGC 

GTT 

TAT 

GCG 

GTC 

AGA 

CAG 

AAA 

his 

asp 

arg 

ala 

ala 

ser 

gin 

giy 

leu 

thr 

arg 

arg 

gly 

val 

tyr 

ala 

val 

arg 

gin 

lys 

2881/939 

GTA 

AAT 

GAA 

AAC 

CCA 

CTA 

TAT 

GCA 

GAG 

AAG 

TCA 

GAA 

CAC 

GTT 

AAC 

GTG 

TTA 

CTT 

ACT 

AGG 

val 

asn 

glu 

asn 

pro 

leu 

tyr 

ala 

glu 

lys 

ser 

glu 

his 

val 

asn 

val 

leu 

leu 

thr 

arg 

2941/959 

ACG 

GAA 

GAT 

CGC 

ATA 

GTG 

TGG 

AAG 

ACA 

CTG 

CAA 

GGG 

GAT 

CCT 

TGG 

ATT 

AAG 

TAC 

CTC 

ACT 

thr 

glu 

asp 

arg 

ile 

val 

trp 

lys 

thr 

leu 

gin 

gly 

asp 

pro 

trp 

ile 

lys 

t,  yr 

leu 

thr 

3001/979 

AAC 

GTT 

CCA 

AAA 

GGG 

AAC 

TTT 

ACA 

GCC 

ACT 

TTA 

GAA 

GAA 

TGG 

CAG 

GCG 

GAA 

CAC 

GAG 

GAC 

asn 

val 

pro 

lys 

gly 

asn 

phe 

thr 

ala 

thr 

leu 

glu 

glu 

trp 

gin 

ala 

glu 

his 

glu 

asp 

3061/999 

ATT 

ATG 

AAG 

GCC 

ATT 

AAT 

TCT 

ACA 

TCC 

ACA 

GTA 

TCT 

GAC 

CCT 

TTC 

GCC 

AGC 

AAA 

GTG 

AAT 

ile 

met 

lys 

ala 

ile 

asn 

ser 

thr 

ser 

thr 

val 

ser 

asp 

pro 

phe 

ala 

ser 

lys 

val 

asn 

3121/1019 

ACA 

TGC 

TGG 

GCT 

AAA 

GCT 

ATT 

ATA 

CCC 

ATC 

CTA 

AGA 

ACG 

GCA 

GGG 

ATA 

GAA 

CTT 

ACA 

TTC 

thr 

cys 

trp 

ala 

lys 

ala 

ile 

ile 

pro 

ile 

leu 

arg 

thr 

ala 

gly 

ile 

glu 

leu 

thr 

phe 

3181/1039 

GAG 

CAG 

TGG 

GAA 

GAT 

CTA 

TTC 

CCG 

CAA 

TTT 

CGT 

AAT 

GAC 

CAA 

CCT 

TAC 

TCC 

GTG 

ATG 

TAT 

glu 

gin 

trp 

glu 

asp 

leu 

phe 

pro 

gin 

phe 

arg 

asn 

asp 

gin 

pro 

tyr 

ser 

val 

met 

tyr 

3241/1059 

GCC 

CTA 

GAT 

GTG 

ATA 

TGT 

ACC 

AAG 

ATG 

TTC 

GGC 

ATG 

GAT 

CTG 

AGC 

AGT 

GGG 

ATC 

TTC 

TCT 

ala 

leu 

asp 

val 

ile 

cys 

thr 

lys 

met 

phe 

gly 

met 

asp 

leu 

ser 

ser 

gly 

ile 

phe 

ser 

3301/1079 

CGT 

CCT 

GAG 

ATA 

CCT 

CTA 

ACG 

TTC 

CAT 

CCC 

GCG 

GAC 

GTC 

GGC 

CGA 

GTG 

AGA 

GCT 

CAC 

TGG 

arg 

pro 

glu 

ile 

pro 

leu 

thr 

phe 

his 

pro 

ala 

asp 

val 

gly 

arg 

val 

arg 

ala 

his 

trp 

Figure  7c.  See  legend  on  last  page  of  this  sequence 
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3361/1099 

GAT 

AAC 

TCC 

CCA 

GGA 

GGG 

CAG 

AAG 

TTT 

GGG 

TAT 

AAC 

AAG 

GCG 

GTA 

ATC 

CCA 

ACT 

TGC 

AAG 

asp 

asn 

ser 

pro 

gly 

gly 

gin 

lys 

phe 

gly 

tyr 

asn 

lys 

ala 

val 

ile 

pro 

thr 

cys 

lys 

3421/1119 

AAA 

TAC 

CCA 

GTG 

TAC 

TTA 

AGA 

GCA 

GGA 

AAA 

GGG 

GAC 

CAA 

ATA 

CTC 

CCC 

ATA 

TAT 

GGC 

AGA 

lys 

tyr 

pro 

val 

tyr 

leu 

arg 

ala 

gly 

lys 

gly 

asp 

gin 

ile 

leu 

pro 

ile 

tyr 

gly 

arg 

3481/1139 

GTT 

TCA 

GTC 

CCA 

TCG 

GCA 

CGG 

AAC 

AAT 

TTA 

GTT 

CCC 

TTA 

AAC 

AGA 

AAT 

CTA 

CCA 

CAC 

TCG 

val 

ser 

val 

pro 

ser 

ala 

arg 

asn 

asn 

leu 

val 

pro 

leu 

asn 

arg 

asn 

leu 

pro 

his 

set 

3541/1159 

CTA 

ACT 

GCA 

AGC 

CTG 

CAG 

AAA 

AAA 

GAA 

GCA 

GCT 

CCC 

TTG 

CAC 

AAG 

TTC 

CTT 

AAC 

CAA 

CTA 

leu 

thr 

ala 

ser 

leu 

gin 

lys 

lys 

glu 

ala 

ala 

pro 

leu 

his 

lys 

phe 

leu 

asn 

gin 

leu 

3601/1179 

CCA 

GGA 

CAC 

AGT 

ATG 

CTG 

CTG 

GTC 

TCT 

AAG 

GAA 

ACA 

TGC 

TAT 

TGC 

GTG 

TCC 

AAG 

CGA 

ATC 

pro 

gly 

his 

ser 

met 

leu 

leu 

val 

ser 

lys 

glu 

thr 

cys 

tyr 

cys 

val 

ser 

lys 

arg 

ile 

3661/1199 

ACA 

TGG 

GTC 

GCT 

CCG 

CTG 

GGA 

GTC 

AGA 

GGA 

GCT 

GAC 

CAC 

AAC 

CAT 

GAC 

CTG 

CAT 

TTC 

GGG 

thr 

trp 

val 

ala 

pro 

leu 

gly 

val 

arg 

gly 

ala 

asp 

his 

asn 

his 

asp 

leu 

his 

phe 

gly 

3721/1219 

TTC 

CCA 

CCA 

CTG 

TCC 

AGA 

TAC 

GAC 

CTT 

GTG 

GTG 

GTT 

AAT 

ATG 

GGA 

CAA 

CCG 

TAC 

AGG 

TTC 

phe 

pro 

pro 

leu 

ser 

arg 

tyr 

asp 

leu 

val 

val 

val 

asn 

met 

gly 

gin 

pro 

tyr 

arg 

phe 

3781/1239 

CAT 

CAC 

TAC 

CAG 

CAG 

TGC 

GAG 

GAG 

CAT 

GCC 

GGC 

CTC 

ATG 

AGG 

ACG 

TTG 

GCC 

CGG 

TCA 

G^A 

his 

his 

tyr 

gin 

gin 

cys 

glu 

glu 

his 

ala 

gly 

leu 

met 

arg 

thr 

leu 

ala 

arg 

ser 

ala 

3841/1259 

CTC 

AAC 

TGC 

CTA 

AAA 

CCA 

GGA 

GGA 

ACA 

TTA 

GCC 

CTG 

AAA 

GCA 

TAT 

GGT 

TTC 

GCC 

GAC 

TCC 

leu 

asn 

cys 

leu 

lys 

pro 

gly 

gly 

thr 

leu 

ala 

leu 

lys 

ala 

tyr 

gly 

phe 

ala 

asp 

ser 

3901/1279 

AAT 

AGT 

GAG 

GAC 

GTT 

GTT 

CTG 

TCT 

TTA 

GCG 

AGG 

AAA 

TTC 

GTG 

CGG 

GCA 

TCC 

GCA 

GTG 

AGA 

asn 

ser 

glu 

asp 

val 

val 

leu 

ser 

leu 

ala 

arg 

lys 

phe 

val 

arg 

ala 

ser 

ala 

val 

arg 

3961/1299 

CCA 

TCG 

TGT 

ACA 

CAG 

TTT 

AAC 

ACA 

GAG 

ATG 

TTC 

TTT 

GTA 

TTT 

AGG 

CAG 

CTG 

GAC 

AAC 

GAT 

pro 

ser 

cys 

thr 

gin 

phe 

asn 

thr 

glu 

met 

phe 

phe 

val 

phe 

arg 

gin 

leu 

asp 

asn 

asp 

4021/1319 

CGT 

GAG 

CGC 

CAA 

TTC 

ACT 

CAG 

CAT 

CAC 

TTG 

AAT 

TTA 

GCA 

GTA 

TCC 

AAT 

ATft 

TTC 

GAC 

AAT 

arg 

glu 

arg 

gin 

phe 

thr 

gin 

his 

his 

leu 

asn 

leu 

ala 

val 

ser 

asn 

ile 

phe 

asp 

asn 

4081/1339 

TAT 

AAA 

GAC 

GGA 

TCC 

GGA 

GCA 

GCT 

CCT 

TCT 

TAT 

CGC 

GTT 

AAG 

AGA 

ATG 

AAT 

ATC 

GCA 

GAC 

tyr 

lys 

asp 

gly 

ser 

gly 

ala 

ala 

pro 

ser 

tyr 

arg 

val 

lys 

arg 

met 

asn 

ile 

ala 

asp 

4141/1359 

TGC 

ACA 

GAA 

GAA 

GCA 

GTG 

GTG 

AAC 

GCA 

GCT 

AAC 

GCG 

CGG 

GGA 

AAA 

CCT 

GGG 

GAC 

GGA 

GTA 

cys 

thr 

glu 

glu 

ala 

val 

val 

asn 

ala 

ala 

asn 

ala 

arg 

gly 

lys 

pro 

gly 

asp 

gly 

val 

4201/1379 

TGC 

AGA 

GCT 

ATC 

TTC 

AAA 

AAG 

TGG 

CCG 

AAG 

TCA 

TTT 

GAG 

AAC 

GCT 

ACC 

ACT 

GAA 

GTG 

GAA 

cys 

arg 

ala 

ile 

phe 

lys 

lys 

trp 

pro 

lys 

ser 

phe 

glu 

asn 

ala 

thr 

thr 

glu 

val 

glu 

4261/1399 

ACC 

GCG 

GTC 

ATG 

AAA 

CCA 

TGC 

CAC 

AAC 

AAG 

GTT 

GTT 

ATA 

CAT 

GCA 

GTG 

GGT 

CCT 

GAT 

TTT 

thr 

ala 

val 

met 

lys 

pro 

cys 

his 

asn 

lys 

val 

val 

ile 

his 

ala 

val 

gly 

pro 

asp 

phe 

4321/1419 

AGA 

AAG 

TAC 

ACG 

TTG 

GAG 

GAA 

GCG 

ACG 

AAG 

CTA 

CTG 

CAG 

AAC 

GCA 

TAC 

CAT 

GAT 

GTG 

GCA 

arg 

lys 

tyr 

thr 

leu 

glu 

glu 

ala 

thr 

lys 

leu 

leu 

gin 

asn 

ala 

tyr 

his 

asp 

val 

ala 

4381/1439 

AAG 

ATA 

GTG 

AAC 

GAG 

AAA 

GGC 

ATC 

TCC 

TCG 

GTA 

GCT 

ATA 

CCG 

CTG 

CTC 

TCA 

ACA 

GGT 

ATC 

lys 

ile 

val 

asn 

glu 

lys 

gly 

ile 

ser 

ser 

val 

ala 

ile 

pro 

leu 

leu 

ser 

thr 

gly 

ile 

4441/1459 

TAT 

GCT 

GCC 

GGA 

GCT 

GAT 

CGC 

CTG 

GAT 

CTC 

TCG 

CTG 

AGA 

TGT 

CTT 

TTC 

ACC 

GCG 

CTG 

GAT 

tyr 

ala 

ala 

gly 

ala 

asp 

arg 

leu 

asp 

leu 

ser 

leu 

arg 

cys 

leu 

phe 

thr 

a  la 

leu 

asp 

Figure  7cl.  See  legend  on  last  page  of  this  sequence 
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4501/1479 

CGT 

ACG 

GCG 

GAT 

GTC 

ACA 

ATA 

TAT 

TGC 

CTA 

GAT 

AAG 

AAG 

TGG 

GAG 

CAA 

CGC 

ATA 

GCA 

atg 

thr 

asp 

ala 

asp 

val 

thr 

ile 

tyr 

cys 

leu 

asp 

lys 

lys 

trp 

glu 

gin 

arg 

ile 

ala 

4561/1499 

GAT 

GCT 

ATT 

AGG 

ATG 

CGA 

GAA 

CAA 

GTA 

ACT 

GAA 

TTA 

AAA 

GAT 

CCG 

GAC 

ATA 

GAG 

ATA 

GAT 

asp 

ala 

ile 

arg 

met 

arg 

glu 

gin 

val 

thr 

glu 

leu 

lys 

asp 

pro 

asp 

ile 

glu 

ile 

asp 

4621/1519 

GAA 

GGA 

TTA 

ACC 

CGG 

GTA 

CAC 

CCA 

GAT 

aGC 

TGC 

CTC 

AAG 

GAT 

CAC 

ATA 

GGC 

TAC 

AGT 

ACC 

glu 

gly 

leu 

thr 

arg 

val 

his 

pro 

asp 

ser 

cys 

leu 

lys 

asp 

his 

ile 

gly 

tyr 

ser 

thr 

4681/1539 

CAG 

TAT 

GGG 

AAA 

TTG 

TAG 

TCA 

TAC 

TTT 

GAA 

GGT 

ACT 

AAA 

TTC 

CAC 

CAA 

ACC 

GCA 

AAA 

GAC 

gin 

tyr 

gly 

lys 

leu 

tyr 

ser 

tyr 

phe 

glu 

giy 

thr 

ly 

phe 

his 

gin 

thr 

ala 

lys 

asp 

4741/1559 

ATA 

GCC 

GAG 

ATT 

CGT 

GCG 

CTG 

TTT 

CCT 

GAT 

GTA 

CAA 

GCC 

GCT 

AAC 

GAA 

CAA 

ATC 

TGC 

CTG 

ile 

ala 

glu 

ile 

arg 

ala 

leu 

phe 

pro 

asp 

val 

gin 

ala 

ala 

asn 

gxu 

gin 

ile 

cys 

leu 

4801/1579 

TAG 

ACT 

TTA 

GGC 

GAA 

CCG 

ATG 

GAG 

TCC 

ATA 

CGC 

GAA 

AAG 

TGC 

CCA 

GTC 

GAA 

GAC 

TCC 

CCG 

tyr 

thr 

leu 

gly 

glu 

pro 

met 

glu 

ser 

ile 

arg 

glu 

lys 

cys 

pro 

val 

glu 

asp 

ser 

pro 

4861/1599 

GCA 

TCA 

GCA 

CCT 

CCT 

AAG 

ACA 

ATA 

CCT 

TGC 

CTA 

TGT 

ATG 

TAT 

GCT 

ATG 

ACA 

GCC 

GAA 

CGT 

ala 

ser 

ala 

pro 

pro 

lys 

thr 

ile 

pro 

cys 

leu 

cys 

met 

tyr 

ala 

met 

thr 

ala 

glu 

arg 

4921/1619 

ATT 

TGC 

CGC 

GTA 

CGC 

AGT 

AAC 

TCC 

GTA 

ACG 

AAC 

ATA 

ACG 

GTG 

TGC 

TCA 

TCC 

TTT 

CCG 

TTA 

ile 

cys 

arg 

val 

arg 

ser 

asn 

ser 

val 

thr 

asn 

ile 

thr 

val 

cys 

ser 

ser 

phe 

pro 

leu 

4981/1639 

CCC 

AAG 

TAG 

CGA 

ATA 

AAG 

AAC 

GTA 

CAA 

AAG 

ATA 

CK.; 

ACG 

AAA 

GTG 

pro 

lys 

tyr 

arg 

ile 

lys 

asn 

val 

gin 

ly.' 

ile 

gin 

cys 

thr 

lys 

val 

Figure  7e.  Translated  sequence  of  Aura  virus.  This  sequence  starts  near  the  5’terminus  of  the 
genome,  although  the  exact  5’  end  is  not  known.  The  translated  sequence  shown 
encompasses  nsPl,  nsP2,  and  the  N-terminal  (conserved)  region  of  nsP3.  Nucleotides  are 
numbered  from  the  beginning  of  the  sequence;  amino  acids  are  numbered  from  the 
beginning  of  the  open  reading  frame. 
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SEQUENr  riJ  ANALYSIS  OF  WHATAROA  VIRUS. 


We  have  obtained  most  of  the  sequence  of  Whataroa  virus  RNA,  11.7  kb  in 
length.  This  sequence  is  being  assembled  to  give  the  complete  sequence  of  this 
virus  RNA.  We  were  interested  in  this  virus  because  it  represents  a 
geographically  isolated  Sindbis-like  virus,  being  found  in  New  Zealand  and 
presumably  transferred  there  by  migratory  birds. 

The  sequencesof  a  stretch  of  the  nonstructural  protein  coding  region  of  the 
Whataroa  genome  is  shown  in  Pigs  8.  The  sequence  begins  near  the  beginning  of 
the  nsPZ  gene  and  continues  through  to  the  end  of  the  nsP2  region  of  the  virus 
genome,  a  stretch  of  about  2000  nucleotides.  From  the  analysis  of  this  sequence, 
Whataroa  virus  can  clearly  be  considered  to  be  a  strain  of  Sindbis  virus  that  has 
spread  to  New  Zealand.  The  amino  acid  sequence  deduced  from  the  nucleotide 
sequence  in  Pig.  8  is  compared  to  that  of  the  AR339  strain  of  Sindbis  virus,  isolated 
from  Egypt  in  1952,  in  Fig.  9.  These  amino  acid  sequences  are  84%  identical. 
Furthermore,  we  found  that  Whataroa  virus  RNA  has  the  characteristic  3'  NTR 
of  the  Sindbis  viruses. 

SEQUENCE  ANALYSIS  OF  OTHER  ALPHAVIRUSES 

We  have  obtained  the  nucleotide  sequence  encoding  the  nsP3  and  nsP4 
genes  of  several  other  alphaviruses,  in  order  to  examine  the  relationships  of 
viruses  isolated  from  Australia,  India,  and  South  Africa  to  other  alphaviruses. 
Sequences  of  this  region  for  Sindbis  virus  isolated  from  India  in  1953  is  shown  in 
Fig.  10,  that  for  a  Sindbis  virus  isolated  in  Australia  in  1975  is  shown  in  Pig.  11, 
and  that  for  a  Sindbis  virus  isolated  from  South  Africa  in  1963  is  shown  in  Fig.  12. 
The  South  African  isolated  came  from  a  human  patient  exhibiting  symptoms  of 
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1  FINRKuYHIAVHGPAKNTEE  £0 

1  TTCATTAACAGGAAATTGTACCACATTGCAGTTCATGGTCCCGCGAAGAATACTGA6GAA  60 

I  I  I  i  I 

£1  EQYKAHRAEAADTEYVFDVD  40 

61  GAGCAGTATAAAGCTATGAGAGCAGAAGCGGCGGACACCGAATATGTCTTCGATGTCGAC  1  £0 

1  I  I  I  I 

41  KKKCVKREEASGLVLVGELT  60 

121  AAGAAGAAGTGC6TTAAGAGA6AAGAAGCATCGGGTCTTGTGTTAGTAGGCGAACTTACC  180 

I  I  I  I  i 

61  NPPYHEMALEGLKTRPAVPY  80 

181  AACCCGCCATACCATGAAATGGCGCTGGAAGGGCTGAAGACCCGTCCTGCAGTACCTTAT  240 

I  1  t  I  I 

81  KVETIGVIGTPGSGKSAIIK  100 

241  AAAGTTGAAACAATCGGAGTCATCGGCACACCGCGATCCGGAAAATCCGCAATCATTAAA  300 

I  I  i  I  I 

101  NIVTTRDLVTSGKKENCREI  120 

301  AACATCGTCACTACCAGGGATCTTGTGACCAGCGGAAAGAAAGAAAACTGCCGGGAAATA  360 

I  I  i  I  i 

121  EADVLKHRKMQIVSKTVDSV  140 

361  GAAGCTGACGTCCTCAAACACCGAAAAATGCAAATCGTTTCAAAGACGGTCGACTCCGTT  420 

I  I  i  I  I 

141  LLNGCHKSVDILYVDEAYAC  160 

421  TTGCTTAATGGTTGCCACAAGTCACTCGACATCCTGTATGTCGACGAAGCTTACGCGTGC  480 

I  I  i  I  I 

161  HAGTLLALIAIVRPRNKVVL  180 

481  CACGCTGGCACCCTATTGGCCTTAATCGCCATAGTCCGACCTAGAAATAAAGTGGTCCTA  540 

I  I  I  I  f 

181  CGDPKQCGFFNMHQLKVHFN  200 

541  TGTGGCGACCCAAAACAGTGTGGTTTCTTCAACATGATGCAGCTGAAGGTCCACTTTAAC  600 

I  I  I  I  I 

201  DPERDICTKTFYK  YISRRCT  220 

601  GACCCTGAACGCGACATTTGCACGAAGACGTTCTACAAATACATTTCTCGTCGGTGCACG  660 

I  i  I  i  I 

221  QPVTAIVSTLHYNGKMRTTN  240 

661  CAACCGGTGACAGCAATTGTGTCTACACTGCACTATAACGGAAAAATGCGCACCACCAAC  720 

i  I  I  I  I 

241  PCNKNIVIDITGQTKPKPGD  260 

721  CCATGTAACAAGAACATCGTAATCGACATTACCGGACAAACCAAACCAAAACCAGGAGAT  780 

I  I  I  t  I 

261  I  ILTCFRGUVKQLQIEYPGH  280 

781  ATTATCCTGACGTGTTTCAGGGGGTGGGTCAAGCAGCTGCAGATTGAATACCCAGGACAC  840 

I  I  I  I  I 

281  EVMTAAVSQGLTRKGVFPVR  300 

841  GAAGTTATGACTGCGGCAGTTTCACAAGGATTGACGCGAAAAGGGGTCTTTCCCGTAAGA  900 

i  i  I  I  I 

301  GKVNENPLYAITSEHVNVLL  320 

901  GGAAAAGTCAACGAGAACCCGTTATATGCCATCACTTCTGAGCACGTCAACCTACTGTTG  960 

I  I  I  I  I 

321  TRTEDRIVUKTLQGDPUIKQ  340 

961  ACACGAACCGAAGATCGTATCGTGTGGAAAACGCTACAAGGAGACCCTTGGATAAAGCAG  1 020 

I  I  I  I  I 

341  LTNIPKGNFHATVEEUEAEH  36  0 

1  021  CTCACAAACATTCCAAAAGGCAACTTTCACGCCACCGTCGAAGAATGGGAGGCTGAACAC  1 080 

I  I  I  I  I 

Figure  8a.  See  legend  on  next  page. 
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JO  I  idrMrKofMrf-ooK  jou 

1081  AAGGGAATAATGGAGGCTATCACTAGCCCGGCCCCCCGCAGCAACCCTTTCAGCTGTAAG  1 140 

.  ■  *.  1  I  I  I  I 

3^1  TNVCWAKALEPILSTAGISL  400 

11'  t  ACAAACGTGTGCTGGGCGAAGGCACTACAACCTATACTATCGACCGCTGCCATATCACTA  1800 

I  i  I  I  I 

401  TGCQUADLFPQFEDDKPHSA  420 

1201  ACT6GATGTCAGTGGGCAGATTTGTTTCCGCAATTTGAAGATGACAAACCACATTCGGCC  1260 

I  I  I  I  I 

421  lYALDVICVKFFGMDLTSGl  440 

1261  ATATACGCTCTAGACGTCATTTGCGTAAAGTTCTTTGGCATGGATTTAACTAGCGGCATA  1320 

I  i  I  I  I 

441  FSKPLIPLTYHPAEGDRKTA  460 

1321  TTTTCAAAACCGTTGATCCCATTGACTTATCACCCCGCCGAAGGGCACCGGAAGACAGCG  1380 

I  I  I  I  I 

461  HUDNSPGQRKYGFDKAVVAE  480 

1381  CACTGGGACAACAGTCCAGGCCAACGAAAGTACGGCTTTGACAAAGCCGTTGTAGCTCAA  1440 

I  I  i  !  i 

481  LSRRFPVFCMADKGVQLDLQ  500 

1441  TTGTCCCGCAGATTCCCAGTATTCTGCATGCCA6ACAAACGAGT6CAACTGGACCTACAG  1500 

I  I  I  I  I 

501  TGRTRVV7SRFNLVPFNRNL  520 

1 501  ACGGGCCGNACGCGCGTAGTCNCGTCACCCTTCAACCTTGTGCCATTTAACAGAAATCTG  1 560 

I  I  t  I  I 

521  PHSLVPEYKTQTPGOLSAFI  540 

1561  CCCCACTCGCTTGTCCCGGAGTATAAAACACAAACTCCAGGTCAGCTAAGCGCCTTTATC  1 620 

I  I  I  I  I 

541  RQFKQNTILLVSETPAEHST  560 

1 621  CGCCAGTTTAAACAAAACACCATCCTGCTTGTATCTGAAACACCTGCCGAACATTCCACC  1 680 

i  I  I  I  I 

561  KSVEWIAPLGTLGATKCYNL  580 

1 681  AAATCTGTG6AATGGATTGCACCGCTGGGTACGCTTGGAGCCACCAAATCCTATAATTTA  1 740 

I  I  I  I  I 

581  AFGFPPQSRYDLVIINIGTK  600 

1741  GCATTCGGCTTTCCGCCTCAGTCGACGTACGACCTAGTGATCATAAATATCGGTACAAAA  1800 

I  I  I  I  I 

601  FRHHHYQQCEDHAATMKTLS  620 

1 801  TTCAGACACCACCACTATCAACAGTGCGAAGACCACGCCGCCACCATGAAGACACTGTCA  1 860 

I  I  i  I  I 

621  RSALNCLNPGGTLVVKAYGY  640 

1861  CGTTCCGCCCTTAATTGCCTGAACCCGGGTGGCACATTGGTGGTAAAAGCATATGGCTAC  1 920 

I  i  I  I  I 

641  ADRNSEDIITALARKFVRVS  660 

1 921  GCGGACAGAAACAGTGAAGACATCATTACAGCCCTGGCACGAAAGTTCGTCAGGGTGTCC  1 980 

I  I  I  I  I 

661  AARPQCVSSNTEMYFIFRQL  680 

1 981  CCGGCCCGCCCACAGTGCGTCTCAAGCAATACAGAGATCTACTTCATTTTCAGACAACTG  2040 

I  I  I  I  I 

681  DNSRTRQFTPHHLNCVVSSV  700 

2041  GACAACAGCAGAACACGTCAATTCACACCTCATCACCTCAACTGCGTCGTTTCGTCAGTG  21 00 

I  I  I  I  I 

701  YEGTRDGVGA  710 

2101  TACGAGGGAACAAGAGACGGAGTTGGTGCT  2130 

i  I 


Figure  8b.  Translated  nucleotide  sequence  of  Whataroa  virus  in  the  region 
encoding  nonstructural  protein  nsP2.  By  homology  with  Sindbis  virus,  the 
sequence  shown  begins  at  amino  acid  97  of  nsP2  and  continues  to  the 
nsP2/nsP3  cleavage  site. 
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FINRKLYHIAVHGPAKNTEEEQYKAMRAEAADTEYVFDVDKKKCVKREEA 

.V . M . VTK.  .L  .E . R...K.. 

*  *  *  * 
S6LVLVGELTNPPYHEMALEGLKTRPAVPYKVETIGVIGTPGSGKSA1 IK 

. S . L . 

♦  ♦  ♦  ♦  * 
NIVTTRDLVTSGKKENCREIEADVLKHRKMQIVSKTVDSVLLNCCHKSVD 

ST.  .A . RL.G.  .  -  T . M . A.E 

***** 

ILYV0EAYACHAGTLLALIAIVRPRNKVVLCGDPKQCGFFNMMQLKVHFN 

V . F . A . K . M . 

***** 

DPERDICTKTFYKYISRRCTQPVTAIVSTLHYNGKMRTTNPCNKNIVIDI 

H.  .K . D.  .  -K . K.  .  -E.  .  . 

***** 

TGQTKPKPGDIILTCFRGWVKOLQIEYPGHEVMTAAVSOCLTRKGVFPVR 

.  ,A . . D . A . YA.  . 

***** 

GKVNENPLYAITSEHVNVLLTRTEDRIVUKTLQGDPUIKQLTNIPKGNFH 

Q . L . P . Q 

***** 

ATVEEUEAEHKGIMEAITSPAPRSNPFSCKTNVCUAKALEPILSTACISL 

..I.D . IA..N..T..A . A....V. 

***** 

TGCQUADLFPQFEDDKPHSAIYALDVICVKFFGMDLTSGIFSKPLIPLTY 

. SE . A . I . L.  .  ,QS . 

***** 

HPAE6DRKTAHWDNSPCQRKYGFDKAVVAELSRRFPVFCMADKGVQLDLQ 

.  .  DSA.PV . T.  .  .  .  Y.H.  lA . QL  ,  C  .  .  T . 

***** 

TGRTRVV?SRFNLVPFNRNLPHSLVPEYKTQTPCQLSAFIROFKONTILL 

. ISAQH.  .  .  .V . A . EKQ.  .PVKK.LN.  .  .HHSV.V 

***** 

vsetpaehstksveuiaplgtlgatkcynlafgfppqsrydlviinigtk 

.  .  .  EKI  .  APR  .R1 . I.IA..D.N . A . F . 

***** 
FRHHHYQOCEDHAATMKTLSRSALNCLNPGGTLVVKAYGYADRNSEDI IT 

Y.N..F . L . S . VV. 

***** 

ALARKFVRVSAARPQCVSSNTEMYFIFRQLDNSRTRQFTPHHLNCVVSSV 

. D . L . I  .  .  . 

***** 

YEGTRDGVGA 


Figure  9.  Aligned  deduced  amino  acid  sequences  of  the  nonstructural  protein 
regions  of  Whataroa  virus  and  Sindbis  virus,  beginning  with  amino  acid  97  of 
Sindbis  virus  n8P2.  The  upper  sequence  in  each  case  is  Whataroa  virus,  and 
amino  acid  identity  in  the  Sindbis  sequence  is  indicated  with  a  dot. 
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1 


GCUCCGGCCUAUCGCUCGAAACGUGAGAACAOCGCCGAGUGCCUCGAAGAGGCCGUAGUU  60 
APAYRSKRENIAECLEEAVV 

61  AAUGCCGCGAAUGCACUCG^CGGCCGGGCGAAGGGGUAUGCAAAGCCAUAUAUAAAAAA  120 

NAANALGRPGEGVCKAIYKK 

121  UGGCCUAAUAGUUUCGUCGAUUCCGCGACAGAGACUGGAACGGCUAAGCUAGUGUGCUGU  180 

WPNSFVDSATETGTAKLVCC 

181  CAAGGAAAGAAAAUUAUCCACGCCGUCGGACCCGACUUCCGCAAACACUCCGAGGCAGAA  240 

QGKKIIHAVGPDFRKHSEAE 

241  GCACUGAAGAUUCUCCAGAACACAUACCACGCCAUAGCAGAUUUGGUUAACAAACAUGGA  300 

ALKILQNTYHAIADLVNKHG 

301  AUCAAGACUGUAGCGAUCCCGCUACUAUCCACCGGGAUUUACGCAGCGGGAAAAGACAGA  360 

IKTVAIPLLSTGIYAAGKDR 

361  CUCGAGGUCUCCUUAAACOGUCUUACCACCGCCCUGGACAGAACAGACG^GACGUCACA  420 

LEVS  LNCLTTALDRTDADVT 

421  AUCUACUGUCUAGACAAAAAAUGGAAAGAAAGGAUCGAUGCGGUUAUAC^UUGAAGGAG  480 

lYCLDKKWKERIDAVIQLKE 

481  UCGGUGACG(^CUGAAGGAUGAGGAUAU^GAUCGAC^UGAGUUAGUAUGGAUCCAC  540 

SVTELKDEDMEIDDELVWIH 

541  CCGGAUAGUUGUCUCAAGG^GGAAAGGGUAUAGCACAACAAAAGGUAAACUUUAUUCG  600 

PDSCLKGRKGYSTTKGKLYS 

601  UACUUUGAGGGGACUAAGUUUCAUCAGGCAGCAAAAGACAUGGCGGAGAUUAAAGUACUU  660 

YFEGTKFHQAAXDMAEIKVL 

661  UUUCCCGAUGAGCAAGAGUGCAACGAGCAGUUGUGUGCAUACAUCCUUGGUGAAACCAUG  720 

FPDEQECNEQLCAYILGETM 

721  GAAGCCAUCAGGGAAAAAUGUCCAGUGGACUUUAAUCCGUCGUCCAGUCCGCCGAAGACA  780 

EAIREKCPVDFNPSSSPPKT 

781  CUCCCCUGUUUGUGCAUGUAUGCCAUGAC^CUGAGAGAGUGCACCGUCUGCGUAGCAAC  840 

LPCLCMYAMTPERVHRLRSN 

841  AACGUCAAGUCCAUCACAGUGUGUUCGUCUACCCCACUUCCGAAGCACAAGAUCAAGAAC  900 

NVKSITVCSSTPLPKHKIKN 

901  GUUCAGAAAGUACAGUGCACGAAAGUGGUCUUGUUCAAUCCACAGACCCCUGAAUUUGUC  960 

VQKVQCTKVVLFNPQTPEFV 

961  CCUGCCCGUAAGUACAUAGAAGCACAACCAAAAGACGUAAGCCAAGAUGCAGAAGAAAGC  1020 

PARKYIEAQPKDVSQDAEES 

1021  CCUGCCGCAGCCGCCCGAGAUAACACCUCACGGGACGUAACAGACAUAUCCCUGGAUGUG  1080 

PAAAARDNTSRDVTDISLDV 

1081  GAAGAAAGUCAAGCCGCAGCCGGCCAACCAGAGGAGCGCUCGGGGGACAACACUUCCCGG  1140 

EESQAAAGQPEERSGDNTSR 

1141  GAUGUAACA(^UAUAUCCCUAGAUCACGACAGCGAUAGUGAGGUGGGCUCCAUCUUCUCU  1200 

DVTDISLDHDSDSEVGSIFS 

1201  AACCUUAGCUGCUCCAGUCAAUCCAUCACUAGUAUGGACAGCUGGUCCUCCGGACCGGGA  1260 

NLSCSSQSITSMDSWSSGPG 

Figure  10a.  See  legend  at  the  end  of  this  sequence. 
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ucgaucacgauaaacgagaaccgcaccauucaggucacggcggagauacacaaugcuccu 

SITINENRTIQVTAEIHNAP 

1321  GCCGCGUUGCCUGUUCCACCACCACGCCUUAAGAAACOGOIACGCUUAGCAGCCCAGAAG 
AALPVPPPRLKKLARLAAQK 

1381  CCCAAUCCGCCAUCCGACCCGCCUUCGACGGUCGAGGACGUGUCGAUGCGCUUGUCCUUC 
PNPPSDPPSTVEDVSMRLSP 

1441  CCUGCCACGGUGUCGUUCG(^UCAUUCUCCGACGGAGAAGUCGACGACCUUAGCCGCGAU 
PATVSFGSFSDGEVDDLSRD 

1501  AAAGCAGUGUCAGAACCGGUGGUCUUUGGUGCUUUCGAGCCUGGAGAGGUAACCUCUAUC 
KAVSEPVVFGAFEPGEVTSI 

1561  AUCGAAUCAAGGUCUGUCGUGUCAUUCCCCGUGCAUAAACGCCGGCGCA(^GACGGGGC 
lESRSVVSFPVHKRRRRRRG 

1621  AAAAGAACC^UAUUGACUAACCGGGGUAGGUGGGUACAUCUUCUCAACUGACACGGGA 
KRTEY*LTGVGGYIFSTDTG 

1 681  CCGGGCCACCUCCAGAAGAAGUCAGUUCUGCAAAACCAGCUUACUGAACCGACCCUCGAG 
PGHLQKKSVLQNQLTEPTLE 

1741  CGCAAUCAAUUAGAACGAAUGUAUGCGCCCAGUCUCGAUGUCAAGAAAGAGGAACUUCUG 
RNQLERMYAPSLDVKKEELL 

1801  AAACUUAAGUACCAAAUGAUGCCCACCGAAGCCAAUAAAAGUAGGUACCAGUCUAGAAAG 
KLKYQMMPTEANKSRYQSRK 

1861  GUUGAAAAUCAAAAAGCGGUAACCACCGA^GGUUACUGUCGGGACUGAAGAUGUACAUC 
VENQKAVTTERLLSGLKMYl 

1921  CACUCAGAG^CCAACCUGAGUGUUAUAA(KUCACUUAUCCGAAACCGUCGUACUCCAGC 
HSENQPECYKVTYPKPSYSS 

1981  AGUGUCCCUCOTJAGUUACCAGAACCCUGAAUUCGCCGUA(k:UGUUUGCAAUAACUACCUG 
SVPLSYQNPEFAVAVCNNYL 

2041  CAUGAGAACUACCCGACGGUUGCCUCCOAUCAGAUUACG^CGAAUAUGAUGCCUACCUC 
HENYPTVASYQITDEYDAYL 

2101  GACAUGGUG^CGGCACUGUUGCGUGUCUCGACACUGCAACAUUCUGCCCUGCGAAAUUA 
DMVDGTVACLDTATFCPAKL 

2161  CGUAGCUUUCCGAAGAAACAUGAGUACCGCGCACCUAACAUCAGGAGUGCCGUGCCGUCU 
RSFPKKHEYRAPNIRSAVPS 

2221  GCUAUGCAGAACACUCUACAGAACGUCCUGAAUGCAGCAACAAAGAGGAAUUGCAACGUU 
AMQNTLQNVLNAATKRNCNV 

2281  ACUCAGAUGAGAGAACUACCGACCCUAGACUCCGCGACCUUUAACGUGGAAUGCUUCCGA 
TQMRELPTLDSATFNVECFR 

2341  AAGUACGCGUGCAAUGACGAGUAUUGGGCUGAAUUCUCCGAAAAACCAAUCAGGAUCACC 
KYACNDEYWAEFSEKPIRIT 

2401  ACGGAGUUUGUUACGGCGUACGUGGCGAGAUUGAAGGGACCAAAGGCUGCUGCUCUGUUU 
TEFVTAYVARLKGPKAAALF 

2461  GCAAAAACGCAUAACCUAGUCCCAUUGCAAGAAGUACCUAUGGACAGGUUUGUGAUGGAC 
AKTHNLVPLQEVPMDRFVMD 

Figure  10b.  See  legend  at  the  end  of  this  sequence. 
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2521  AUGAAGCGAGAUGUCAAGGUGACUCCGGGCACAAAACACACCGAAGAAAGGCCUAAGGUG  2580 

MKRDVKVTPGTKHTEERPKV 

2581  CAGGUAAUCCAAGCGGCUGAGCCUUUUGCUACAGCCUACCUUUGUGGCAUCCACCGAGAG  2640 

QVIQAAEPFATAYLCGl  HRE 

2641  CUGGUACGCCGGCUUACCGCGGUUCUACUCCCGAACGUACACACCCUGUUUGACAUGUCU  2700 

LVRRLTAVLLPNVHTLFDMS 

2701  GCGGAGGAUUUCGACGCGAUUAUUGCCGA^UUUCCGACAAGGUGACGCCGUGCUCGAG  2760 

AEDFDAIIAEHFRQGDAVLE 

2761  ACAGACAUC(k:GUCAUUCGAUAAGAGUCAGGACGAUGCGAUGGCCCUGACUGGGCUGAUG  2820 

TDIASFDKSQDDAMALTGLM 

2821  AUCCUGGAG^CCUCGGCGUCGAUCAACCGCUGCUGGACCUCAUCGAGUGUGCCUUCGGA  2880 

ILEDLGVDQPLLDLIECAFG 

2881  GAAAUAUCAUCUACGCAUCUGCCUACUGGGACACGGUUUAAGUUCGGCUCAAUGAUGAAA  2940 

EISSTHLPTGTRFKFGSMMK 

2941  UCCGGAAUGUUUCUUACGCUCUUCGUGAACACCAUCUUGAAUGUCGUGAUCGCUAGUCGC  3000 

SGMFLTLFVNTILNVVIASR 

3001  GUGCUUGAGCACAGGUUAACAGGAUCACGAUGUGCCGCAUUCAUUGGAGACGAUAACAUC  3060 

VLEHRLTGSRCAAFIGDDNI 

3061  AUCCACGGCGUGGUAUCAGACAAGGAAAU^CGAAAGGUGCGCCACUUGGCUGAAUAUG  3120 

IHGVVSDKEMAERCATWLNM 

3121  GAGGUAAAAAUCAUUGACGCGGUGAUCGGCGAGCGUCCUCCGUAUUUCUGUGGUGGCUUU  3180 

EVKIIDAVIGERPPYFCGGF 

3181  AUACUACAG^CUCUGUCACCCAAACAGCCUGUCGAGUG(k:UGACCCCCUAAAAAGACUG  3240 

ILQDSVTQTACRVADPLKRL 

3241  UUCAAGCUAGGAAAACCUui'GCCCGCAGAUGAUGACCAA^UGAAGACAGAAGAAGGGCU  3300 

FKLGKPLPADDD  QDEDRRRA 

3301  UUGCUGGAUGAGACUAAGGCGUGGUUUAGAGUGGGCAUAACCGAAACAUUGGCUACUGCG  3360 

LLDETKAWFRVGITETLAI  A 

3361  GUAGCAACGCGGUACGAAGUUGAUAACAUCACGCCUGUCCUGCUGGCACUGAGGACCCUU  3420 

VATRYEVDNITPVLLALRTL 

3421  GCGCAAAGCAAGAGAUCCUUUCAGUCCAUAAGAGGGGAAAUGAAGCAUCUCUACGGUGGU  3480 

AQSKRSFQSIRGEMKHLYGG 

3481  CCUAAAUAG  3489 

P  K  * 

Figure  10c.  Nucleotide  sequence  of  the  region  of  the  genome  encoding 
^nstructural  proteins  nsP3  and  nsP4  of  Sindbis  A1036,  isolated  in  India  in  1953. 
The  sequence  has  been  translated  using  the  single  letter  amino  acid  code. 
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GCUCCGGCCUACCGCUCGAAACGUGAGAAUAUCGCCGAAUGCCUUGAAGAGGCCGUAGUU 
APAYRSKRENIAECLEEAVV 

61  AACGCCGCGAACCCAOJCGf^CGUCCGGGCGAAGGGGUGUGUAAAGCCAUAUAUAAAAAA  120 

NAANPLGRPGEGVCKAIYKK 

121  UGGCCCAAUAGUUUUGUCGAUUCUGCGACAGAGACUGGAACAGCUAAGCUAGUGUGCUGU  180 

WPNSFVDSATETGTAKLVCC 

181  CAAGGAAAAAAGAUUAUCCAUGCCGUCGGACCUGACUUCCGUAAACACCCCGAGGCAGAA  240 

QGKKIIHAVGPDFRKHPEAE 

241  GCGCUGAAGAUUCUCCAGAACACAUACCACGCCAUCGCAGAUUUGGUUAACAAACAUGGA  300 

ALKILQNTYHAIADLVNKHG 

301  AUCAAGACCGUAGCGAUCCCGCUUCUAUCCACCGGGAUUUACGCAGCGG<biAAAGACAGA  360 

IKTVAIPLLSTGIYAAGKDR 

361  CUUGAGGUCUCUUUAAACUIXCUCACUACCGCCCUGGACAGAACUGACGCAGACGUCACA  420 

LEVSLNCLTTALDRTDADVT 

421  AUCUACUGCCUUGACAAAAAAUGGAAAGAACGGAUUGAUGCGUUUAUACAGUUGAAGGAG  480 

lYCLDKKWKERIDAFIQLKE 

481  UCGGUGACGGAACOGAAGGAUGAUGACAUGGAGAUCGAC^CGAAUUAGUAUGGAUCCAC  540 

SVTELKDDDMEIDDELVWIH 

541  CCGGAUAGUUGCCUCAAGG(kjAGGAAAGG(kjUUAGUACGACGAAGGGCAAGCUCUACUCG  600 

PDSCLKGRKGFSTTKGKLYS 

601  UACUUUGAGGGGACUAAAUUUCAUCAAGCAGCAAAAGACAUGGCUGAGAUCAAGGUACUU  660 

YFEGTKFHQAAKDMAEIKVL 

661  UUUCCCGAU^GCAAGAGUGCAACGAGCAACUGUGUGCAUACAUUCUAG^GAAACCAUG  320 

FPDEQECNEQLCAYILGETM 

721  GAAGCCAUCAGGGAAAAAUGUCCAGUGGACOUUAAUCCGUCGUCCAGUCCGCCGAAGACG  780 

EAIREKCPVDFNPSSSPPKT 

781  CUUCCCUGUUUGUGUAUGUACGCCAUGAC(XCCGAGAGAGUGCACCGCUUGCGUAGCAAU  840 

LPCLCMYAMTPERVHRLRSN 

841  AACGUCAAAUCCAUCACAGUAUGCUCGUCAACCCCGCUUCCGAAGCACAAAAUUAAGAAC  900 

NVKSITVCSSTPLPKHKIKN 

901  GUUCAGAAAGUACAGUGCACGAAAGUAGUCCUAUUCAACCCACAAACGCCUGAAUUUGUC  960 

VQKVQCTKVVLFNPQTPEFV 

961  CCUGCCCGciiAGUACAUAGAAACACAACCGAAGGACGACAGUCAAGAGGCGGAAGAAAAC  1020 

PARKYIETQPKDDSQEAEEN 

1021  CCUGCCGCAIXCGAUAACACUUCACGGGAUGUAACAGACGUAUCUCUAGAUGUGGAAGGA  1080 

PAAADNTSRDVTDVSLDVEG 

1081  GAUCGCGUU^GGCCAACCGAUCAGAGGU(k:ACUCAGAG(^CAACACCUCCCGAGAUGUA  1140 

DRVAANRSEVHSEDNTSRDV 

1141  ACAGACAUAUCUCUAGACCACAACAGUGAUAGCGAGGUGGGCUCCAUUUUCUCUGACCUC  1200 

TDISLDHNSDSEVGSIFSDL 

1201  AGCUGCUCCAGUCAUUCCAUCACCAGCAUGGACAGCUGGUCCUCCGGACCGAGCUCGAUC  1260 

SCSSHSITSMDSWSSGPSSI 

Figure  11a.  See  legend  on  the  last  page  of  this  sequence. 
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AUGCUAAACGGGAAUCACACCAUCCAGGUCACGGCAGAGAUACACAACGCUCCUGCUGCA 
MLNGNHTIQVTAEIHNAPAA 

1321  CCGCCCGUACCACCACCACGCCUCAAGAAACUGGCGCGCUUGGCAGCUCAGAAGUCCGAU 
PPVPPPRLKKLARLAAQKSD 

1381  CCGCCAUCCAGCCCGCCCUCAACGGUUGAGGACGUGUCGAUGCGCCUGUCAUUCCCUGCC 
PPSSPPSTVEDVSMRLSFPA 

1441  ACGGUGUCAUUCGGAUCUUUUUCUGACGGCGAAGUCGACGAUCUUAGUCGCGAAAAAGCA 
TVSFGSFSDGEVDDLSREKA 

1501  GUGUCAGAACCAGUGGUCUUUGGUGCUUUCGAGCCAGGA(^GGUAACAUCUAUCAUUGAA 
VSEPVVFGAFEPGEVTSIIE 

1561  GCAAGGUCUGUCGUGUCAUUCCCCGUGAAUAAACGCCGGCGCAGGAGACGGGGCCAAAAG 
ARSVVSFPVNKRRRRRRGQK 

1621  AAAACCGAAUAUUGACUAACCGGGGUAGGUGGGUAUAUCUUCUCGACUGACACGGGACCG 
KTEY*LTGVGGYIFSTDTGP 

1681  GGUCACCUCCAGAAAAAAUCGGUUCUACAAAACCAGCUUACGGAACCGACCCUCGAGCGU 
GHLQKKSVLQNQLTEPTLER 

1741  AAUCAAUUAGAACGAGUGUAUGCACCCAGUCUUGAUGCCAAGAAAGAGGAACUCUUGAAA 
NQLERVYAPSLDAKKEELLK 

1801  CUCAAGUACCAAAUGAUGCCCACCGAAGCCAAUAAAAGUAGGUACCAGUCUAGAAAGGUA 
LKYQMMPTEANKSRYOSRKV 

1861  GAAAACCAA^GCCGUAACCACCGAGAGGUUACUGUCGGGAUUGAAGAUGUACAUUCAC 
ENQKAVTTERLLSG  LKMYI  H 

1921  UCAGAGAACCAACCCGAGUGUUACAAGGUCACCUAUCCGAAACCGUCGUACUCUAGCAGU 
SENQPECYKVTYPKPSYSSS 

1981  GUUCCCCUUAGUUACCAGAGCCCCGAAUJCGCCGUAGCCGUCUGCAAUAACUACCUGCAU 
VPLSYQSPEFAVAVCNNYLH 

2041  GAGAAUUAUCCAACGGUUGCCUCCUAUCA^UUACGGAU^UAUGACGCCUACCUUGAC 
ENYPTVASYQITDEYDAYLD 

2101  AUGGUGGACGGCACCGUAGCGUGUCUCGACACCGCUACAUUXAJGCCCCGCGAAAUUACGC 
MVDGTVACLDTATFCPAKLR 

21 61  AGCUUCCCGAAGAAACACGAGUACCGAGAACCUAACAUCAGGAGCGCCGUACCGUCCGCU 
SFPKKHEYREPNIRSAVPSA 

2221  AUGCAGAACACUCUACAGAACGUCCUGAACGCAGCAACAAAGAGGAAUUGCAAUGUUACU 
MQNTLQNVLNAATKRNCNVT 

2281  CAGAUGAGAGAACUACCGACUUUAGACUCCGCAACCUUUAAUGUGGAAU(k:UUUCGAAAG 
QMRELPTLDSATFNVECFRK 

2341  UACGCGUGCAACGACGAGUAUUGGGCUGAAUUCUCCGAAAAACCAAUUA^JAUCACCACA 
YACNDEYWAEFSEKPIRITT 

2401  GAGUUUGUCACGGCGUACGUGGCGAGAUU^GGGACCAAAGGCUGCUGCACUGUUUGCU 
EFVTAYVARLKGPKAAALFA 

2461  AAAACGCAUAACCUAGUCCaVCUGCAAGAAGUACCUAUGGACAGGUUUGUGAUGGACAUG 
KTHNLVPLQEVPMDRFVMDM 

Figure  11b.  See  legend  on  the  last  page  of  this  sequence. 
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2521  AAGCGAGACGUUAAGGUGACUCCGGGCACl^GCACACCGAAGAAAGACCCAAAGUGCAG  2580 

KRDVKVTPGTKHTEERPKVQ 

2581  GUAAUCCAAGCGGCAGAGCCUCUAGCUACAGCCUAUUUAUGCGGCAUCCACCGUGAGCUG  2640 

VIQAAEPLATAYLCGIHREL 

2641  GUACGCAGGCUUACCGCAGUCCUGCUUCC^CGUACACACCCUUUUUGAUAUGUCUGCG  2700 

VRRLTAVLLPNVHTLFDMSA 

2701  GAAGAUUUC^UGCUAUCAUUGCCGAGCAUUUUCACCAGGGUGACGCUGUGCUCGAGACA  2760 

EDFDAIIAEHFHQGDAVLET 

2761  GACAUCGCGUCGUUCGAUAAGAGCCAAGACGAUGCGAUGGCCCUGACGGGGCUGAUGAUC  2820 

DIASFDKSQDDAMALTGLMI 

2821  CUGGAGGACCUCGGAGUCGACCAGCCAUU(k:UGGACCUCAUCGAGUGCGCCUUCGGGGAA  2880 

LE  DLGVDQPLLDLIECAFGE 

2881  AUAUCAUCUACGCACCUGCCGACCGGGACACGGUUUAAGUUCGGCUCAAUGAUGAAAUCC  2940 

ISSTHLPTGTRFKFGSMMKS 

2941  GGAAUGUUCCUCACGCUCUUUGUGAACACCAUCUUGAAUGUCGUGAUAGCUAGUCGCGUG  3000 

GMF  L  T  LFVNT  I  LNVV  I  A  S  RV 

3001  CUCGAGCACAGGUUAGCAGAAUCACGAUGCGCCGCAUUCAUCCGAGACGACAAUAUUAUU  3060 

LEHRLAESRCAAFIGDDNII 

3061  CACGGCGUGGUAUCCGAr'iAAGAAAUGGCUGAAAGGUGC(k:CACUUGGCUGAAUAUGGAG  3120 

HGVVS-’KEMAERCATWLNME 

3121  GUAAAAAUUALJ"  ^CGCAGUAAUUGGCGAACGUCCUCCGUACUUCUGUGGCGGCUUUAUA  3180 

VKI  IDAVIGERPPY  FCGGF  I 

3181  CUGCAGGACUCAGUCACCCAAACAGCCUGCCGAGUGGCG^CCCCCUAAAAAGAUUGUUC  3240 

LQDSVTQTACRVADPLKRLF 

3241  AAAUUAGGAAAACCAUUACCUGCAGAUGAUGACCAAGAUGAAGACAGAA(^GGGCUCUG  3300 

KLGKPLPADDDQDEDRRRAL 

3301  CUGGAUGAGACCAAGGCGUGGUUUAGAGU^GCAUAACUGAGACACUGGCUACUGCGGUA  3360 

LDETKAWFRVGITETLATAV 

3361  GCAACGCGGUAUGAAGUUGAUAACAUCACACCGGUCCUGCUGGCACUGAGGACCCUUGCG  3420 

ATRYEVDNITPVLLALRTLA 

3421  CAAAGCAAGAGAUCUUUUCAGGCCAUAAG<^GAAAAUGAAGCAUCUCUACGGUGGUCCU  3480 

QSKRSFQAIRGKMKHLYGGP 

3481  AAAUAG  3486 

K  * 


Figure  11c.  Nucleotide  sequence  of  the  region  of  the  genome  encoding 
nonstructural  proteins  nsP3  ana  nsP4  of  an  isolate  of  Sindbis  virus  isolated  from  a 
mosquito  pool  nrom  Australia  in  1975. 
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1  GCACCGUCAUACCGCACUAAAAGGGAG^CAUUGCUGAUUGUCAAGAGGAAGCAGUUGUC  60 
APSYRTKRENIADCQEEAVV 

61  AAUGCAGCCAAUCCGCUGGGCAGACCAGGCGAAGGAGUCUGCCGUGCCAUCUAUAAACGU  120 

NAANPLGRPGEGVCRAIYKR 

121  UGGCCGAACAGUUUCACCGAUUCAGCCACAGAGACCGGCACCGCAAAACUGACUGUGUGC  180 

WPNSFTDSATETGTAKLTVC 

181  CAAGGAAAG^GUGAUCCACGCGGUUGGCCCUGAUUUCCGGAAACACCCAGAGGCAGAA  240 

QGKKVIHAVGPDFRKHPEAE 

241  GCCCUGAAAUUGCUGCAAAACGCCUACCAUGCAGUGGCAGACUUAGUAAAUGAACAUAAU  300 

ALKLLQNAYHAVADLVNEHN 

301  AUCAAGUCUGUCGCCAUCCCACUGCUAUCUACAGGCAUUUACGCAGCCGGAAAAGACCGC  360 

IKSVAIPLLSTGIYAAGKDR 

361  CUUGAAGUAUCACUUAACUtXUUGACAACCGCGCUAGAUAGAACUGAUGCGGACGUAACC  420 

LEVSLNCLTTALDRTDADVT 

421  AUCUACUGCCUGGAUAAGAAGUGGAAGGAi^GAAUCGACGCGGUGCUCCAACUUAAGGAG  480 

lYCLDKKWKERIDAVLQLKE 

481  UCUGUAACA^GCUGAAGGAUGAGGAUAUGGAGAUCGAC^CGAGUUAGUAUGGAUCCAU  540 

SVTELKDEDMEIDDELVWIH 

541  CCGGACAGUUGCCUGAAGGGAAGAAAGGGAUUCAGUACUACAAAAGGAAAGUUGUAUUCG  600 

PDSCLKGRKGFSTTKGKLYS 

601  UACUUUGAAGGCACCAAAUUCCAUCAAGCAGCAAAAGAUAUGGCGGAGAUAAAGGUCCUG  660 

YFEGTKFHQAAKDMAEIKVL 

661  UUCCCAAAUGACCAGGAAAGCAACGAGCAACUGUGUGCCUACAUAUUGGGGGAGACCAUG  720 

FPNDQESNEQLCAYILGETM 

721  GAAGCAAUCCGCGAAAAAUGCCCGGUCGACCACAACCCGUCGUCUAGCCCGCCAAAAACG  780 

EAIREKCPVD'-NPSSSPPKT 

781  CUGCCGUGCCUCUGCAUGUAUGCCAUGA.  s^GCAGAAAGGGUCCACAGACUCAGAAGCAAC  840 

LPCLCMYAMTPERVHRLRSN 

841  AACGUCAAAGAAGUUACAGUAUGCUCCUCCACCCCCCUUCCAAAGUACAAAAUCAAGAAC  900 

NVKEVTVCSSTPLPKYKIKN 

901  GUUCAGAAGGUUCAGUGCACAAAAGUAGUCCUGUUUAACCCGCAUACCCCUGCAUUCGUU  960 

VQKVQCTKVVLFNPHTPAFV 

961  CCCGCCCGUAAGUACAUAGAAGCGCCAGAACAGCCUGCAGCUCCGCCUGCACAGGCCGAG  1020 

PARKYIEAPEQPAAPPAQAE 

1021  GAGGCCCCC(^GUUGCAGCAACACCAACACCACCUGCAGCUGAUAACACCUCGCUUGAU  1080 

EAPEVAATPTPPAADNTSLD 

1081  GUCACGGACAUCUCACUGGACAUGGAAGACAGUAGCGAAGGCUCACUCUUUUCGAGCUUU  1140 

VTDISLDMEDSSEGSLFSSF 

1141  AGCGGAUCG^CAACUCUAUUACUAGOAUGGACAGUUGGUCGUCAGGACCUAGUUCACUA  1200 

sgsdnsitsmdswssgpssl 

1201  GAGAUAGUA^CCGAAGGCAGGUGGUGGUGGCUGACGUCCAUGCCGUCCAAGAGCCUGCC  1260 

EIVDRRQVVVADVHAVQEPA 

Figure  12a.  See  legend  on  last  page  of  this  sequence. 
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CCUGUUCCACCGCCAAGGCUAAAGAAGAUGGCCCGCCUGGCAGCGGCAAGAAUGCAGGAA 
PVPPPRLKKMARLAAARMQE 

1321  GAGCCAACUCCACCGGCAAGCACCAGCUCUGCGGACGAGUCCCUUCACcinJUCUUUUGGU 
EPTPPASTSSADESLHLSFG 

1381  GGGGUAUCCAUGUCCUUCG^UCCCUUUUCGACGGAGAGAUGGCCCGCUUGGCAGCGGCA 
GVSMSFGSLFDGEMARLAAA 

1441  CAACCCCCGGCAAGUACAU^CCUACGGAUGUGCCUAUGUCUUUCGGAUCGUUUUCCGAC 
QPPASTCPTDVPMSFGSFSD 

1501  GGAGAGAUU(^GGAGCUGA^CGCAGAGUAACCGAGUCU^GCCCGUCCUGUUUGGGUCA 
GEIEELSRRVTESEPVLFGS 

1561  UUUGAACCGGGCGAAGUGAACUCAAUUAUAUCGUCCCGAUCAGCCGUAUCUUUUCCACCA 
FEPGEVNSIISSRSAVSFPP 

1621  CGCAAGCAGAGACGUAGACKAGGAGCAGGAGGACCGAAUACUGACUAACCGGGGUAGGU 
RKQRRRRRSRRTEY*LTGVG 

1681  GGGUACAUAUUUUCGACGGACACAGGCCCUGGGCACUUGaW^GAAGUCCGUUCUGCAG 
GYIFSTDTGPGHLQKKSVLQ 

1741  AACCAGCUUACAGAACCGACCUUGGAGCGCAAUGUUCUGGAAAGAAUCUACGCCCCGGUG 
NQLTEPTLERNVLERIYAPV 

1801  CUCGACACGUCGAAAGAGGAACAGCUCAAACUCAGGUACCAGAUGAUGCCCACCGAAGCC 
LDTSKEEQLKLRYQMMPTEA 

1861  AACAAAAGCAGGUACCAGUCUAGAAAAGUAGAAAAUCAGAAAGCCAUAACCACUGAGCGA 
NKSRYQSRKVENQK  AITTER 

1 921  CUGCUUUCAGGGCUACGACUGUAUAACUCUGCCACAGAUCAGCCAGAAUGCUAUAAGAUC 
LLSGLRLYNSATDQPECYKI 

1981  ACCUACCCGAAACCAUCGUAUUCCAGCAGUGUACCGGCGAACUACUCUGACCCAAAGUUU 
TYPKPSYSSSVP  ANYSDPKF 

2041  GCUGUAGCUGUUUGCAACAACUAUCUGCAUGAGAAUUACCCGACGGUAGCAUCUUAUCAG 
AVAVCNNYLHENYPTVASYQ 

2101  AUCACCGACGAGUACGAUGCUUACUUGGAUAUGGUAGACGGGACAGUCGCUUGUCUAGAU 
ITDEYDAYLDMVDGTVACLD 

2161  ACUGCAACUUUUUGCCCCGCCAAGCUUAGAAGUUACCCGAAAAGACACGAGUAUAGAGCC 
TATFCPAKLRSYPKRHEYRA 

2221  CCAAACAUCCGCAGUGCGGUUCCAUCAGCGAUGCAGAACACGUUGCAAAACGUGCUCAUU 
PNIRSAVPSAMQNTLQNVLI 

2281  GCCGCGACUAAAAGAAACUGCAACGUCACACAAAUGCGUGAAUUGCCAACACUGGACUCA 
AATKRNCNVTQMRELPTLDS 

2341  GCGACAUUCAACGUUGAAUGCUUUCGAAAAUAUGCAUGUAAUGACGAGUAUUGGGAGGAG 
ATFNVECFRKYACNDEYWEE 

2401  UUUGCCCGAAAGCCAAUUAGGAUCACUACUGAGUUCGUUACCGCAUACGUGGCCAGACUG 
FARKPIRITTEFVTAYVARL 

2461  AAAGGCCCUAAGGCCGCCG^CUGUUCGCAAAGACGCAUAAUUUGGUCCCAUUGCAAGAA 
KGPKAAALFAKTHNLVPLQE 

Figure  12b.  See  legend  on  the  last  page  of  this  sequence. 
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2521  GUGCCUAUGGAUAGGUUCGUCAUGGACAUGAAAAGAGACGUGAAAGUUACACCUGGCACG  2580 

VPMDRFVMDMKRDVKVTPGT 

2581  AAACACACAGAAGAAAGACCGAAAGUACAAGUGAUACAAGCCGCAGAACCCCUGGCGACC  2640 

KHTEERPKVQVIOAAEPLAT 

2641  GCUUACCUGUGCGGGAUCCACCGGGAGUUAGUGCGCAGGCUUACAGCCGUCUUGCUACCC  2100 

AYLCGIHRELVRRLTAVLLP 

2701  AACAUUCACACGCUUUUUGACAUGUCGGCGGAGGACUlU^UGCAAUCAUAGCAGAACAvG  2760 

NIHTLFDMSAEDFDAIIAEH 

2761  UUCAAGCAAGGUGACCCGGUACUGGAGACGGAUAUCGCCUCGUUCGACAAAAGCCAAGAC  2820 

FKQGDPVLETDIASFDKSQD 

2821  GACGCUAUG(k;GUUAACUG^CUGAUGAUCUUGGAAGACCUGGGUGUGGACCAACCACUA  2880 

DAMALTGLMILEDLGVDQPL 

2881  CUCGACUUGAUCGAGUGCGCCUUUGGAGAi^UAUCAUCCACCCAUCUGCCCACGGGUACC  2940 

LDLIECAFGEISSTHLPTGT 

2941  CGUUUCAAAUUCGGGGCGAUGAUGAAAUCCGGAAUGUUCCUCACGCUCUUUGUCAACACA  3000 

RFKFGAMMKSGMFLTLFVNT 

3001  GUUCUGAAUGUCGUUAUCGCCAGCAGAGUAUUGGAGGAGCGGCUUAAAACGUCCAAAUGU  3060 

VLNVVIASRVLEERLKTSKC 

3061  GCAGCAUUUAUCGGCGACGACAACAUCAUACACGGAGUAGUAUCUGACAAAGAAAUGGCU  3120 

AAFIGDDNIIHGVVSDKEMA 

3121  GAGAGGUGUGGCACCUGGCUCAACAUGGA^UUAAGAUCAUUGACGCAGUCAUCGGCGAG  3180 

ERCATWLNMEVKI  I  DAVIGE 

3181  AGACCGCCUUACUUCUGCGGUGGAUUCAUCUUGCAAGAUUCGGUUACCUCCACAGCGUGU  3240 

RPPYFCGGFILODSVTSTAC 

3241  CGCGUGGCGGACCCCUUGAAAAGGCUGUUUAAGUUGGGUAAACCGCUCCCAGCCGACGAC  3300 

RVADPLKRLFKLGKPLPADD 

3301  GAGCAAGAC^GACAGAA^CGCGCUCU^UAGAUGAAACAAAGGCGUGGUUUAGAGUA  3360 

EQDEDRRRALLDETKAWFRV 

3361  GGUAUAACA^CACCUUAGCAGUGGCCGU^CAACUCGGUAUGAGGUAGACAACAUCACA  3420 

GITDTLAVAVATRYEVDNIT 

3421  CCUGUCCUGCUGGCAUUGA^CUUUUGCCCAGAGCAAAAGAGCAUUUCAAGCCAUCAGA  3480 

PVLLALRTFAQSKRAFQAI  R 

3481  GGGGAAAUAAAGCAUCUCUACGGUGGUCCUAAAUAG  3516 

GEIKHLYGGPK* 

Figure  12c.  Nucleotide  sequence  of  the  region  of  the  genome  encoding 
nonstructural  proteins  nsP3  and  nsP4  for  the  Girdwood  South  African  strain  of 
Sindbis  virus  isolated  in  1963. 
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viral  disease.  These  viruses  are  all  closely  related,  exhibiting  90%  or  greater 
amino  acid  sequence  identity  in  the  conserved  region  of  nsPS  or  in  nsP4. 
Conclusions  as  to  sequence  relationships  are  similar  to  conclusions  drawn  from 
the  analysis  of  the  3'  NTR. 


CONCLUSIONS 

We  have  identified  an  important  antigenic  epitope  present  in  E2  of  the 
alphaviruses.  This  epitope,  located  in  whole  or  in  part  within  the  domain  of  E2 
between  residues  170  and  220,  depending  upon  the  antibody,  is  clearly  of  major 
importance  for  the  neutralization  of  the  virus  infectivity  and  thus  for  vaccine 
design. 

We  have  established  the  relationships  between  many  of  the  Sindbis-like 
alphaviruses.  The  Sindbis-like  viruses,  which  are  found  throughout  the  Old 
World  from  Northern  Europe  to  Africa,  India,  the  Philippines  and  the 
Australasian  region  including  New  Guinea,  are  a  clearly  identifiable  group  of 
viruses.  They  share  a  minimum  of  80%  aioino  acid  sequence  identity  in  the 
nonstructural  proteins  and  possess  a  characteristic  and  conserved  3'  NTR. 
Virulent  strains  exist  that  can  cause  significant  disease  in  man,  and  the 
relationship  of  the  virulent  strains  to  avirulent  strains  has  been  establishec  It  is 
of  considerable  interest  that  viruses  belonging  to  this  group  coexist  in  many  parts 
of  the  world  with  other  alphaviruses  that  are  demonstrably  different  in  their 
epidemiology,  serology,  organization  of  the  3'  NTR,  and  evolutionary  history,  even 
though  many  of  these  non-Sindbis  alphaviruses  cause  diseases  very  similar  to 
those  caused  by  the  virulent  Sindbis-like  viruses. 

We  found  that  a  strain  of  Sindbis  virus  from  Northern  Europe  that  causes 
Ockelbo  disease  in  Sweden,  Pogosta  disease  in  Finland,  or  Karelian  fever  in 
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Russia,  a  disease  characterized  by  a  polyarthritis  whose  symptoms  can  persist  for 
months  or  years,  are  very  closely  related  to  pathogenic  strains  of  Sindbis  virus 
isolated  from  South  Africa.  We  concluded  that  a  South  African  strain  of  Sindbis 
was  introduced  into  Northern  Europe,  probably  in  the  1960s,  where  it  continues  to 
cause  epidemics  of  a  significant  hiunan  disease  (Shirako  et  al.,  1991). 

We  have  shown  that  Aura  virus  is  a  New  World  representative  of  the 
Sindbis  viruses.  Further  analysis  is  required  to  determine  whether  it  is  one  of  the 
parents  of  Western  equine  encephalitis  virus,  but  the  hypothesis  that  Western 
equine  encephalitis  virus  is  a  virus  that  emerged  from  a  recombination  event  has 
received  further  support  from  these  studies. 

We  have  also  shown  that  high  throughput  automated  DNA  sequencing  is 
ideally  suited  to  the  rapid  analysis  of  an  RNA  virus  family  such  as  the 
alphaviruses.  These  procedures  are  rapid  and  generate  large  amounts  of  useful 
information  very  quickly.  Such  procedures  would  be  very  useful  in  defining  the 
origin  and  spread  of  an  epidemic  virus. 
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